pandas 핵심 데이터 구조 --- API

146358 단어 데이터 분석

pandas 기초
pandas 소개
Python Data Analysis Library
pandas 는 NumPy 를 기반 으로 한 도구 로 데이터 분석 작업 을 해결 하기 위해 만 들 어 졌 습 니 다.Pandas 는 대량의 라 이브 러 리 와 일부 표준 데이터 모델 을 포함 시 켜 대형 구조 화 데이터 세트 를 효율적으로 조작 하 는 데 필요 한 도 구 를 제공 했다.
pandas 핵심 데이터 구조
데이터 구 조 는 컴퓨터 가 데 이 터 를 저장 하고 조직 하 는 방식 이다.일반적으로 정 성 스 럽 게 선택 한 데이터 구 조 는 더욱 높 은 운행 이나 저장 효율 을 가 져 올 수 있다.데이터 구 조 는 종종 효율 적 인 검색 알고리즘 과 색인 기술 과 관계 가 있다.
Series
시 리 즈 는 1 차원 배열 로 이해 할 수 있 으 며, index 이름 만 변경 할 수 있 습 니 다.길이 가 정 해진 질서 있 는 사전 과 유사 하 며 Index 와 value 가 있 습 니 다.

import pandas as pd
import numpy as np

#         
s = pd.Series()
#  ndarray      
data = np.array(['a','b','c','d'])
s = pd.Series(data)
s = pd.Series(data,index=[100,101,102,103])
#          	
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
#          
s = pd.Series(5, index=[0, 1, 2, 3])

Series 에 접근 한 데이터:

#         
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s[0], s[:3], s[-3:])
#         
print(s['a'], s[['a','c','d']])

pandas 날짜 처리

# pandas          
dates = pd.Series(['2011', '2011-02', '2011-03-01', '2011/04/01', 
                   '2011/05/01 01:01:01', '01 Jun 2011'])
# to_datetime()         
dates = pd.to_datetime(dates)
print(dates, dates.dtype, type(dates))
print(dates.dt.day)

# datetime          
delta = dates - pd.to_datetime('1970-01-01')
#       
print(delta.dt.days)

Series. dt 는 많은 날짜 와 관련 된 작업 을 제공 합 니 다. 다음 과 같 습 니 다.

Series.dt.year	The year of the datetime.
Series.dt.month	The month as January=1, December=12.
Series.dt.day	The days of the datetime.
Series.dt.hour	The hours of the datetime.
Series.dt.minute	The minutes of the datetime.
Series.dt.second	The seconds of the datetime.
Series.dt.microsecond	The microseconds of the datetime.
Series.dt.week	The week ordinal of the year.
Series.dt.weekofyear	The week ordinal of the year.
Series.dt.dayofweek	The day of the week with Monday=0, Sunday=6.
Series.dt.weekday	The day of the week with Monday=0, Sunday=6.
Series.dt.dayofyear	The ordinal day of the year.
Series.dt.quarter	The quarter of the date.
Series.dt.is_month_start	Indicates whether the date is the first day of the month.
Series.dt.is_month_end	Indicates whether the date is the last day of the month.
Series.dt.is_quarter_start	Indicator for whether the date is the first day of a quarter.
Series.dt.is_quarter_end	Indicator for whether the date is the last day of a quarter.
Series.dt.is_year_start	Indicate whether the date is the first day of a year.
Series.dt.is_year_end	Indicate whether the date is the last day of the year.
Series.dt.is_leap_year	Boolean indicator if the date belongs to a leap year.
Series.dt.days_in_month	The number of days in the month.

DateTimeIndex
지정 한 주기 와 주파 수 를 통 해 date.range() 함 수 를 사용 하면 날짜 서열 을 만 들 수 있 습 니 다.기본적으로 범위 의 빈 도 는 하늘 이다.

import pandas as pd
#      
datelist = pd.date_range('2019/08/21', periods=5)
print(datelist)
#      
datelist = pd.date_range('2019/08/21', periods=5,freq='M')
print(datelist)
#            
start = pd.datetime(2017, 11, 1)
end = pd.datetime(2017, 11, 5)
dates = pd.date_range(start, end)
print(dates)

bdate_range() 상업 날짜 범 위 를 나타 내 는 데 사용 되 며 토요일 과 일요일 을 포함 하지 않 습 니 다.

import pandas as pd
datelist = pd.bdate_range('2011/11/03', periods=5)
print(datelist)

DataFrame
DataFrame 은 표 와 유사 한 데이터 형식 으로 2 차원 배열 로 이해 할 수 있 으 며 색인 은 두 차원 으로 변경 할 수 있 습 니 다.DataFrame 은 다음 과 같은 특징 을 가지 고 있 습 니 다.

잠재 적 인 열 은 서로 다른 유형

이다.

크기 가 변

표지 축 (행 과 열)

행 과 열 에 대해 산술 연산 을 수행 할 수 있다

import pandas as pd

#       DataFrame
df = pd.DataFrame()
print(df)

#      DataFrame
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print(df)
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print(df)
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print(df)

#       DataFrame
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['s1','s2','s3','s4'])
print(df)
data = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(data)
print(df)

핵심 데이터 구조 조작
열 접근
DataFrame 의 단일 열 데 이 터 는 Series 입 니 다.DataFrame 의 정의 에 따 르 면 DataFrame 은 태그 가 있 는 2 차원 배열 로 모든 태그 가 열 에 해당 하 는 열 이름 임 을 알 수 있 습 니 다.

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
     'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df['one'])
print(df[['one', 'two']])

열 추가
DataFrame 에 열 을 추가 하 는 방법 은 매우 간단 합 니 다. 열 색인 을 새로 만 들 면 됩 니 다.이 색인 에 있 는 데 이 터 를 할당 하면 됩 니 다.

import pandas as pd

data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['s1','s2','s3','s4'])
df['score']=pd.Series([90, 80, 70, 60], index=['s1','s2','s3','s4'])
print(df)

열 삭제
어떤 열 데 이 터 를 삭제 하려 면 pandas 가 제공 하 는 방법 pop 을 사용 해 야 합 니 다. pop 방법 은 다음 과 같 습 니 다.

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
     'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 
     'three' : pd.Series([10, 20, 30], index=['a', 'b', 'c'])}
df = pd.DataFrame(d)
print("dataframe is:")
print(df)

#     ： one
del(df['one'])
print(df)

#  pop      
df.pop('two')
print(df)

행 방문
DataFrame 의 한 줄 의 데 이 터 를 방문 해 야 하 는 실현 방식 이 라면 배열 의 선택 방식 으로 ":"를 사용 하면 됩 니 다.

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
    'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df[2:4])

loc 방법 은 DataFrame 색인 이름 을 위 한 절편 방법 입 니 다.loc 방법 사용 방법 은 다음 과 같 습 니 다.

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
     'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df.loc['b'])
print(df.loc[['a', 'b']])

iloc 와 loc 의 차 이 는 iloc 가 받 아야 할 줄 색인 과 열 색인 의 위치 입 니 다.iloc 방법의 사용 방법 은 다음 과 같다.

import pandas as pd

d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
     'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(d)
print(df.iloc[2])
print(df.iloc[[2, 3]])

줄 추가

import pandas as pd

df = pd.DataFrame([['zs', 12], ['ls', 4]], columns = ['Name','Age'])
df2 = pd.DataFrame([['ww', 16], ['zl', 8]], columns = ['Name','Age'])

df = df.append(df2)
print(df)

줄 삭제
색인 탭 을 사용 하여 DataFrame 에서 줄 을 삭제 하거나 삭제 합 니 다.탭 이 반복 되면 여러 줄 이 삭 제 됩 니 다.

import pandas as pd

df = pd.DataFrame([['zs', 12], ['ls', 4]], columns = ['Name','Age'])
df2 = pd.DataFrame([['ww', 16], ['zl', 8]], columns = ['Name','Age'])
df = df.append(df2)
#   index 0  
df = df.drop(0)
print(df)

DataFrame 의 데이터 수정
DataFrame 의 데 이 터 를 변경 합 니 다. 원 리 는 이 부분의 데 이 터 를 추출 하여 새로운 데이터 로 다시 할당 하 는 것 입 니 다.

import pandas as pd

df = pd.DataFrame([['zs', 12], ['ls', 4]], columns = ['Name','Age'])
df2 = pd.DataFrame([['ww', 16], ['zl', 8]], columns = ['Name','Age'])
df = df.append(df2)
df['Name'][0] = 'Tom'
print(df)

DataFrame 상용 속성
번호
속성 또는 방법
묘사 하 다.
1 date_range()
줄/열 탭 (index) 목록 을 되 돌려 줍 니 다.
2 axes
대상 의 데이터 형식 dtype 을 되 돌려 줍 니 다.
3 dtype
시리즈 가 비어 있 으 면 되 돌아 갑 니 다 empty.
4 True
바 텀 데이터 의 위 치 를 되 돌려 줍 니 다. 기본 정의: ndim.
5 1
기본 데이터 의 요소 수 를 되 돌려 줍 니 다.
6 size
시 리 즈 를 values 로 되 돌려 줍 니 다.
7 ndarray
돌아 가기 전 head(n) 줄.
8 n
마지막 tail(n) 줄 로 돌아 갑 니 다.
인 스 턴 스 코드:

import pandas as pd

data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['s1','s2','s3','s4'])
df['score']=pd.Series([90, 80, 70, 60], index=['s1','s2','s3','s4'])
print(df)
print(df.axes)
print(df['Age'].dtype)
print(df.empty)
print(df.ndim)
print(df.size)
print(df.values)
print(df.head(3)) # df    
print(df.tail(3)) # df

Jupyter notebook
Jupyter Notebook (이전에 IPython notebook 이 라 고 불 렸 던) 은 40 여 가지 프로 그래 밍 언어 를 실행 할 수 있 는 대화 형 노트북 입 니 다.브 라 우 저 를 인터페이스 로 사용 하여 배경 에 있 는 IPython 서버 에 요청 을 보 내 고 결 과 를 표시 합 니 다.Jupyter Notebook 의 본질은 웹 응용 프로그램 으로 문학 화 된 프로그램 문 서 를 만 들 고 공유 할 수 있 으 며 실시 간 코드, 수학 방정식, 시각 화 와 markdown 을 지원 합 니 다.
IPython 은 python 의 대화 형 셸 로 기본 python 셸 보다 훨씬 좋 습 니 다. 변 수 를 자동 으로 보완 하고 자동 으로 들 여 쓰 며 bash 셸 명령 을 지원 합 니 다. 유용 한 기능 과 함수 가 많이 내장 되 어 있 습 니 다.
ipython 설치
windows: 전 제 는 numpy, matplotlib pandas 가 있 습 니 다.
pip 로 설치 nOS X: App Store 는 애플 개발 도구 인 Xcode 를 다운로드 하여 설치 합 니 다.
easy 사용설치 또는 pip 에 IPython 을 설치 하거나 원본 파일 에서 설치 합 니 다.
Jupyter 노트북 설치

pip3 install jupyter

Jupyter 노트북 시작

        ，    :
jupyter notebook

pandas 핵심
pandas 서술 적 통계
수치 형 데이터 의 서술 적 통 계 는 주로 수치 형 데 이 터 를 계산 하 는 완전한 상황, 최소 치, 평균치, 중위 수, 최대 치, 4 분 의 자릿수, 극 차, 표준 차, 분산, 협 방 차 등 을 포함한다.NumPy 라 이브 러 리 에서 자주 사용 되 는 통계학 함수 도 데이터 상자 에 대해 설명 적 통 계 를 할 수 있 습 니 다.

np.min	    
np.max	    
np.mean	   
np.ptp	   
np.median	    
np.std	    
np.var	   
np.cov

실례:

import pandas as pd
import numpy as np

#   DF
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack', 'Lee', 'David', 'Gasper', 'Betina', 'Andres']),
  'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}

df = pd.DataFrame(d)
print(df)
#          
print(df.sum())
print(df.sum(1))
print(df.mean())
print(df.mean(1))

pandas 는 통계 관련 함 수 를 제공 합 니 다:
1 pip install ipython
비공 식 관측 수량
2 count()
모든 값 의 합
3 sum()
모든 값 의 평균치
4 mean()
모든 값 의 중위 수
5 median()
값 의 표준 편차
6 std()
모든 값 의 최소 값
7 min()
모든 값 중 최대 값
8 max()
절대 치
9 abs()
배열 요소 의 곱 하기
10 prod()
누계 합계
11 cumsum()
누적 승적
pandas 는 describe 라 는 방법 도 제공 하여 데이터 상자 의 모든 수치 형 특징의 비 공 값 수, 평균 값, 표준 차 등 을 한꺼번에 얻 을 수 있 습 니 다.

import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}

#Create a DataFrame
df = pd.DataFrame(d)
print(df.describe())
print(df.describe(include=['object']))
print(df.describe(include=['number']))

pandas 정렬
Pandas 는 라벨 과 실제 값 에 따라 정렬 하 는 두 가지 정렬 방식 이 있 습 니 다.

import pandas as pd
import numpy as np

unsorted_df=pd.DataFrame(np.random.randn(10,2),
                         index=[1,4,6,2,3,5,9,8,0,7],columns=['col2','col1'])
print(unsorted_df)

줄 탭 으로 정렬cumprod() 방법 을 사용 하여 전달 sort_index() 매개 변수 와 정렬 순 서 를 통 해 axis 순 서 를 정렬 할 수 있 습 니 다.기본적으로 줄 탭 을 오름차 순 으로 정렬 합 니 다.

import pandas as pd
import numpy as np

#         
sorted_df=unsorted_df.sort_index()
print (sorted_df)
#       
sorted_df = unsorted_df.sort_index(ascending=False)
print (sorted_df)

열 태그 로 정렬

import numpy as np

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
unsorted_df = pd.DataFrame(d)
#          
sorted_df=unsorted_df.sort_index(axis=1)
print (sorted_df)

열 값 으로 정렬 하기
색인 정렬 처럼 DataFrame 값 에 따라 정렬 하 는 방법 입 니 다.이것 은 정렬 값 sort_values() 의 열 이름 을 사용 하 는 by 인 자 를 받 아들 입 니 다.

import pandas as pd
import numpy as np

d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}
unsorted_df = pd.DataFrame(d)
#         
sorted_df = unsorted_df.sort_values(by='Age')
print (sorted_df)
#   Age      ，   Rating    
sorted_df = unsorted_df.sort_values(by=['Age', 'Rating'], ascending=[True, False])
print (sorted_df)

pandas 그룹
많은 상황 에서 우 리 는 데 이 터 를 여러 개의 집합 으로 나 누고 모든 부분 집합 에 함 수 를 응용 한다.응용 함수 에서 다음 작업 을 수행 할 수 있 습 니 다:

취 합 - 계산 집계

전환 - 특정 그룹의 작업 수행

필터 - 어떤 경우 데이터 버 리 기

import pandas as pd

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
         'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
         'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
         'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
         'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)
print(df)

데 이 터 를 그룹 으로 나누다

#     Year    
print (df.groupby('Year'))
#       
print (df.groupby('Year').groups)

그룹 반복
groupby 는 교체 가능 한 대상 을 되 돌려 줍 니 다. 순환 을 위해 사용 할 수 있 습 니 다.

grouped = df.groupby('Year')
#       
for year,group in grouped:
    print (year)
    print (group)

그룹 세부 정보 가 져 오기

grouped = df.groupby('Year')
print (grouped.get_group(2014))

패 킷 집합
집합 함 수 는 각 그룹 에 집합 값 을 되 돌려 줍 니 다.그룹 (group by) 대상 을 만 들 면 각 그룹 데이터 에 대해 구 화, 표준 차 이 를 구 하 는 작업 을 수행 할 수 있 습 니 다.

#           
grouped = df.groupby('Year')
print (grouped['Points'].agg(np.mean))
#           、   、   
grouped = df.groupby('Year')
agg = grouped['Points'].agg([np.sum, np.mean, np.std])
print (agg)

pandas 데이터 시트 관련 작업
Pandas 는 기능 이 전면적 인 고성능 메모리 에서 연결 작업 을 하 며 SQL 등 관계 데이터 베이스 와 매우 유사 합 니 다.Pandas 는 DataFrame 대상 간 모든 표준 데이터베이스 연결 작업 의 입구 로 단독 DataFrame 함 수 를 제공 합 니 다.
두 개의 DataFrame 통합:

import pandas as pd
left = pd.DataFrame({
         'student_id':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
         'student_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung', 'Billy', 'Brian', 'Bran', 'Bryce', 'Betty', 'Emma', 'Marry', 'Allen', 'Jean', 'Rose', 'David', 'Tom', 'Jack', 'Daniel', 'Andrew'],
         'class_id':[1,1,1,2,2,2,3,3,3,4,1,1,1,2,2,2,3,3,3,2], 
         'gender':['M', 'M', 'F', 'F', 'M', 'M', 'F', 'F', 'M', 'M', 'F', 'F', 'M', 'M', 'F', 'F', 'M', 'M', 'F', 'F'], 
         'age':[20,21,22,20,21,22,23,20,21,22,20,21,22,23,20,21,22,20,21,22], 
         'score':[98,74,67,38,65,29,32,34,85,64,52,38,26,89,68,46,32,78,79,87]})
right = pd.DataFrame(
         {'class_id':[1,2,3,5],
         'class_name': ['ClassA', 'ClassB', 'ClassC', 'ClassE']})
#     DataFrame
data = pd.merge(left,right)
print(data)

"how"인 자 를 사용 하여 DataFrame 을 통합 합 니 다.

#     DataFrame (   )
rs = pd.merge(left, right, how='left')
print(rs)

다른 합병 방법 은 데이터베이스 와 같 습 니 다:
합병 방법
SQL 등가
묘사 하 다.merge() left
왼쪽 개체 의 키 사용 하기LEFT OUTER JOIN right
오른쪽 개체 의 키 사용 하기RIGHT OUTER JOIN outer
사용 키 의 결합FULL OUTER JOIN inner
사용 키 의 교 집합
시험:

#     DataFrame (   )
rs = pd.merge(left,right,on='subject_id', how='right')
print(rs)
#     DataFrame (   )
rs = pd.merge(left,right,on='subject_id', how='outer')
print(rs)
#     DataFrame (   )
rs = pd.merge(left,right,on='subject_id', how='inner')
print(rs)

pandas 투시 표 와 교차 표
다음 데이터 가 있 습 니 다:

import pandas as pd
left = pd.DataFrame({
         'student_id':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
         'student_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung', 'Billy', 'Brian', 'Bran', 'Bryce', 'Betty', 'Emma', 'Marry', 'Allen', 'Jean', 'Rose', 'David', 'Tom', 'Jack', 'Daniel', 'Andrew'],
         'class_id':[1,1,1,2,2,2,3,3,3,4,1,1,1,2,2,2,3,3,3,2], 
         'gender':['M', 'M', 'F', 'F', 'M', 'M', 'F', 'F', 'M', 'M', 'F', 'F', 'M', 'M', 'F', 'F', 'M', 'M', 'F', 'F'], 
         'age':[20,21,22,20,21,22,23,20,21,22,20,21,22,23,20,21,22,20,21,22], 
         'score':[98,74,67,38,65,29,32,34,85,64,52,38,26,89,68,46,32,78,79,87]})
right = pd.DataFrame(
         {'class_id':[1,2,3,5],
         'class_name': ['ClassA', 'ClassB', 'ClassC', 'ClassE']})
#     DataFrame
data = pd.merge(left,right)
print(data)

투시 계
투시 표 (pivot table) 는 각종 스프 레 드 시트 프로그램 과 다른 데이터 분석 소프트웨어 에서 흔히 볼 수 있 는 데이터 집합 도구 이다.이것 은 하나 이상 의 키 에 따라 데 이 터 를 그룹 별로 집합 하고 각 그룹 에 따라 데 이 터 를 집합 합 니 다.

#  class_id gender       ，         
print(data.pivot_table(index=['class_id', 'gender']))

#  class_id gender       ，    score 
print(data.pivot_table(index=['class_id', 'gender'], values=['score']))

#  class_id gender       ，    score ，  age          
print(data.pivot_table(index=['class_id', 'gender'], values=['score'], columns=['age']))

#  class_id gender       ，    score ，  age          ，   、   
print(data.pivot_table(index=['class_id', 'gender'], values=['score'], 
                       columns=['age'], margins=True))

#  class_id gender       ，    score ，  age          ，   、   
print(data.pivot_table(index=['class_id', 'gender'], values=['score'], columns=['age'], margins=True, aggfunc='max'))

교차 표
교차 표 (cross - tabulation, crosstab 로 약칭) 는 그룹 주파 수 를 계산 하 는 특수 투시 표 입 니 다.

#   class_id  ，     gender，    
print(pd.crosstab(data.class_id, data.gender, margins=True))

pandas 시각 화
기본 그림: 그림 그리 기

import pandas as pd
import numpy as np
import matplotlib.pyplot as mp 

df = pd.DataFrame(np.random.randn(10,4),index=pd.date_range('2018/12/18',
   periods=10), columns=list('ABCD'))
df.plot()
mp.show()

plot 방법 은 기본 선 그림 을 제외 한 소수의 그림 스타일 을 허용 합 니 다.이런 방법 들 은 INNER JOIN 의 plot() 키워드 매개 변수 로 할 수 있다.이것들 은 다음 과 같다.

kind 또는 bar 는 선형

이다.

barh 은 직사 도

이다.

hist 산포도

막대 그래프

df = pd.DataFrame(np.random.rand(10,4),columns=['a','b','c','d'])
df.plot.bar()
# df.plot.bar(stacked=True)
mp.show()

직사 도

df = pd.DataFrame()
df['a'] = pd.Series(np.random.normal(0, 1, 1000)-1)
df['b'] = pd.Series(np.random.normal(0, 1, 1000))
df['c'] = pd.Series(np.random.normal(0, 1, 1000)+1)
print(df)
df.plot.hist(bins=20)
mp.show()

산포도

df = pd.DataFrame(np.random.rand(50, 4), columns=['a', 'b', 'c', 'd'])
df.plot.scatter(x='a', y='b')
mp.show()

떡 모양 도

df = pd.DataFrame(3 * np.random.rand(4), index=['a', 'b', 'c', 'd'], columns=['x'])
df.plot.pie(subplots=True)
mp.show()

데이터 읽 기와 저장
csv 읽 기 및 저장:

# filepath     。         URL。   URL    http，ftp file 
# sep    。read_csv   “,”，read_table      “[Tab]”。
# header   int sequence。           。   infer，      。
# names   array。    。
# index_col         ，   sequence       。 
# dtype          （   key，     values）。
# engine   c  python。        。   c。
# nrows   int。     n 。

pd.read_table(
    filepath_or_buffer, sep='\t', header='infer', names=None, 
    index_col=None, dtype=None, engine=None, nrows=None) 
pd.read_csv(
    filepath_or_buffer, sep=',', header='infer', names=None, 
    index_col=None, dtype=None, engine=None, nrows=None)

DataFrame.to_csv(excel_writer=None, sheetname=None, header=True, index=True, index_label=None, mode=’w’, encoding=None)

엑셀 읽 기 및 저장:

# io       。
# sheetname   excel         。   0。 
# header   int sequence。           。   infer，      。
# names         ，   sequence       。
# index_col         ，   sequence       。
# dtype   dict。    。
pandas.read_excel(io, sheetname=0, header=0, index_col=None, names=None, dtype=None)

DataFrame.to_excel(excel_writer=None, sheetname=None, header=True, index=True, index_label=None, mode=’w’, encoding=None)

JSON 읽 기 및 저장:

#   json       ，    DataFrame
pd.read_json('../ratings.json')

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

형태소 분석은 데스크톱을 구성하는 데 도움이?

문자×기계 학습에 흥미를 가져와 개인 범위의 용도를 생각해, 폴더 정리에 사용할 수 있을까 생각해 검토를 시작했습니다. 이번 검토에서는 폴더 구성 & text의 읽기 → mecab × wordcloud를 실시하고 있...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

RabbitMQ 메시지 생산자 와 소비자

[LintCode/LeetCode] Binary Tree Serialization

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다