Pandas

1. Groupby

split → apply → combine 과정을 거쳐 연산한다.
한개 이상의 column을 묶을 수 있다.

Hierarchial index
두개의 column으로 groupby할 경우, index가 두개 생성
groupby 명령어를 활용해서 hierarchical index를 사용할 수 있다.

  h_index["Devils":"Kings"]
  #value: Devils와 Kings에 관한 정보만 나온다.

hierarchical index.unstack()
→ group으로 묶여진 데이터를 matrix 형태로 전환된다.

filter
특정 조건으로 데이터를 검색할 때 사용

2. Pivot table & Crosstab

groupby, pivot table, crosstab 모두 동일한 형태(목적)로 사용
데이터 분석 하는 법

👉 실제 데이터 만지는 방법 (전처리과정)

  #NoN 값이 있는지 없는지 확인
  df_ipcr["section].isnull().sum() 

  #Null이 아닌 경우만 모아서 subclass를 다시 만들어준다
  df_ipcr = df_ipcr[df_ipcr("subclass").isNull() == False]

  #자료형 변환을 위해 map을 사용한다.
  df_ipcr["ipc_class"] = df_ipcr["ipc_class"].map(str)
  df_ipcr = df)ipcr[df_ipcr["ipc_class"].map(str.isdigit)]

  #digit 형태로 바꿔준다. ex) 01, 02, 03
  two_diit_f = lambda x : '{0:02d}'.format(x)
  two_Digit_f(3)
  # value: 03

3. Merge, Concat

두 개의 데이터를 하나로 합침

Merge
두 데이터 프레임을 공통된 항목을 기준으로 합치는 것.
1) Inner Join
만약 두 column의 이름이 다를 때 왼쪽, 오른쪽 column 이름 지정
```
pd.merge(df_a, df_b, left_on="subject_id", right_on="subject_id")
```
2) Left Join/Outer Join
```
pd.merge(df_a, df_b, on="subject_id", how="left/outer")
```
concat
단순히 두 데이터 프레임을 합친다.

4. DB Persistence

Data loading시 db connection 기능 제공

  #Database 연결 코드
  import sqlite3

  conn = sqlite3.connect("./data/flights.db")
  cr = conn.cursor()
  cur.execute("select *from airlines limit 5;")
  results = cur.fetchall()
  results

  #DB 연결 conn을 사용하여 dataframe 생성
  df_airlines = pd.read_sql_query("select *from airlines;", conn)

Pickle persistence
가장 일반적인 python 파일 persistence

    df_routes.to_pickle("./data/df_routes.pickle")

Author And Source

이 문제에 관하여(Pandas_2), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@juliy9812/Pandas2

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

Pandas_2

1. Groupby

2. Pivot table & Crosstab

3. Merge, Concat

4. DB Persistence

Author And Source

좋은 웹페이지 즐겨찾기