Pandas 소개

준비:
+ 파이썬
+ Pandas
+ Numpy

import pandas as pd
import numpy as np

참조:
+ Pandas Cheat Sheet

데이터 작성, 읽기, 쓰기

DataFrame 만들기

# create a dataframe
df = pd.DataFrame({
    'Apples': [30],
    'Bananas': [21]
})

# create a dataframe with index
df = pd.DataFrame({
    'Apples': [35,41],
    'Bananas': [21, 34]
}, index=['2017 Sales', '2018 Sales'])

Series 작성

다음과 같이 Series 만들기

Flour     4 cups
Milk       1 cup
Eggs     2 large
Spam       1 can
Name: Dinner, dtype: object

sr = pd.Series(
    ['4 cups', '1 cup', '2 large', '1 can'],
    index=['Flour', 'Milk', 'Eggs', 'Spam']
)

Csv 파일 로드

# Read csv file with first column is index
df = pd.read_csv('../path/to/file.csv', index_col=0)

Xsl 파일 로드

# Read excel file from sheet 1
df = pd.read_excel('../path/to/excel.xsl', 'sheet 1')

Sqlite 파일 로드

# using sqlite3 lib
import sqlite3
# need initial connection for sqlite
cnx = sqlite3.connect('../path/to/sqlfile.sqlite')
# query from db
df = pd.read_sql_query('SELECT * FROM tables_name', cnx)

데이터 쓰기

# save to csv
df.to_csv('../path/to/new/file.csv')

# save to excel
df.to_excel('../path/to/file.xsl')

색인, 선택, 할당

열의 데이터 선택

# select column col1 from dataframe
df['col1']

# select first row of column col1
df['col1'][0]

# select first 10 rows from column col1
df['col1'][0:10]
# or using loc
df.loc[0:9, 'col1']
# or using iloc
df.iloc[0:10, index_of_col1]

# select col1 where col1 = xxx
df[df['col1'] == 'xxx']['col1']

# select col1 where col1 = xxx and col2 === yyy
# or |
# and &
df[(df['col1'] == 'xxx') & (df['col2'] === 'yyy')]['col1']

행 데이터 선택

# select first row of dataframe
# using loc
df.loc[0]
# using iloc
df.iloc[0]

# select rows with index
# for example select row with index 1, 2, 3, 6
df.iloc[[1,2,3,6]]
# or using loc
df.loc[[1,2,3,6]]

# select only columns col1, col2
df.loc[[1, 2, 3, 6], ['col1', 'col2']]

합계, 지도 데이터

중앙값(median)

# median of column col1 in dataframe
df['col1'].median()

고유한 값(unique)

# find all unique data of column col1
df['col1'].unique()

자주 나오는 값 (value_counts)

# what values is appear more offen in column col1
df['col1'].value_counts()

칼럼으로서 메소드 적용(apply)

# add 1 for each cell in column col1
def addOne(source):
  return source + 1

df['col1'].apply(addOne)
# or using lambda
df['col1'].apply(lambda x: x + 1)

그룹, 정렬

열을 바탕으로 그룹

# show number of records group by col1
df.groudby(['col1']).size()

# group by col1, show median of col2 for each group
df.groupby(['col1']).col2.median()

# group by col1, show median, sum of col2 for each group
df.groupby(['col1']).col2.agg['median', 'sum']

# multiple index and sorting desc
df.groupby(['col1', 'col2']).col3.agg['min', 'max'].sort_values(by=['min', 'max'], ascending=False)

데이터 유형 및 실수 데이터

데이터 유형 확인

# check col1's datatype
df.col1.dtypes
df['col1'].dtypes

# check multiples cols
df[['col1', 'col2']].dtypes

데이터 유형 업데이트

# change col1 datatype to integer
df.col1 = df.col1.astype(int)

칼럼의 실수 데이터 검색

# find missing values via column col1 and its orcurrs time
df.col1.fillna('N/A').value_counts()

# check is there missing value via column col1
df.col1.isnull().value_counts()

# replace value ? to N/A via col1
df.col1.replace("?", "N/A")

Reference

이 문제에 관하여(Pandas 소개), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/kyo92/items/961322425f8f973984d2

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다