Chi Square Contingency를 사용한 탐색적 데이터 분석
data:image/s3,"s3://crabby-images/d22f5/d22f586bb17051bb7f2d95960b9ded14fdd029a4" alt=""
Download Note Book Here
이 탐색적 데이터 분석은 kaggle.com에서 다운로드한 데이터 세트에서 분석을 수행한 이 연습에서 나만의 개인 학습 연습입니다.
시각화를 위한 플로틀리
시각화를 위한 Seaborn
필요한 라이브러리 가져오기
# Libraries for data manipulation
import pandas as pd
import numpy as np
# Libraries for visualization
import seaborn as sns
import matplotlib.pyplot as plt
# Libraries for operatingsystem
import warnings
import os
warnings.filterwarnings('ignore')
데이터세트 가져오기
# Reading the dataset
df = pd.read_csv(r'C:\Users\user\dl-course-data\abalone.csv')
df.head()
data:image/s3,"s3://crabby-images/d0346/d034637a5b2e20cba9bba48b8e6ea47329a22c2c" alt=""
데이터 정보 확인
# Shape of dataset
df.shape
data:image/s3,"s3://crabby-images/3d2ba/3d2ba520c1ad4ad2ae8f95a03ab465ee493d69b8" alt=""
# Checking the null value in the dataset
df.isnull().sum()
data:image/s3,"s3://crabby-images/c93c5/c93c58ce33e7d0127914f50ca95c6069742119b5" alt=""
# Infromation about dataset
df.info()
data:image/s3,"s3://crabby-images/81b87/81b87f9a5199c74d0be05aacd59ee43cc72c5198" alt=""
# Statistical description of dataset
df.describe().T
data:image/s3,"s3://crabby-images/72557/72557a6eb3699425eaeb3c0f2ad92ba8c4545b0c" alt=""
# Extracting a unique values of type column
a = df['Type'].unique()
print(a)
data:image/s3,"s3://crabby-images/6e924/6e924eaad571af49776377b12e0d20af95d255ee" alt=""
# Finding thee counts of Type
b = df['Type'].value_counts()
print(b)
data:image/s3,"s3://crabby-images/47c67/47c6718d4425ba3342d7dfc6e0d91c3e7519b2b5" alt=""
# Computing Rings by Type
df.groupby(["Type"])["Rings"].count().reset_index(name="count")
data:image/s3,"s3://crabby-images/a0543/a054346e4df5578a10cd4860d9b4710e4da3b533" alt=""
데이터 세트에 ID 열 추가
df['id'] = range(1, len(df)+1)
df.head()
data:image/s3,"s3://crabby-images/32a78/32a78726485e7e0231a145c3b4c116e5582c1408" alt=""
상관관계
# finding the correlation of datasets
correlation = df.corr()
# Longest Shell has the highest positive correlation value
fig = px.imshow(correlation,text_auto=True,aspect="auto")
fig.show()
data:image/s3,"s3://crabby-images/46584/465845215ecdde1dbdbebddd708eb7733796f43e" alt=""
# Type M has the highest number of percentage
import plotly.express as px
import pandas as pd
fig = px.pie(df, values='id', names='Type', title='Abalone Type By Height')
fig.update_traces(hoverinfo='label+percent', textinfo='label+percent', textfont_size=20, pull=[0.1,0.1,0.1],
marker=dict(colors=colors, line=dict(color='#000000', width=2)))
fig.show()
data:image/s3,"s3://crabby-images/b4b77/b4b778c8595e52add695f53bdb2e168c745a7f2c" alt=""
#Type M has the highest number of counts
import plotly.express as px
fig = px.bar(df, x='Type', y='id', color='id')
fig.show()
data:image/s3,"s3://crabby-images/eee01/eee015684d11fbd1034f4004ae2676bd811454e4" alt=""
# Include nbins= number_of_bins to specify histogram shape
px.histogram(df, x="id", color="Type")
data:image/s3,"s3://crabby-images/6534c/6534c1cbaca8aaa88ebb3332b001ff4d53acae1f" alt=""
# Cross tb for Type and Rings for easy understanding
cross_tab = pd.crosstab(df["Type"],df["Rings"],margins=True)
cross_tab
data:image/s3,"s3://crabby-images/1a6f4/1a6f4702942c8b758d53f42c813fb2c6ea74869e" alt=""
# The F type is the factor determinant for the whole parameters
sns.factorplot(df["Type"],df["Rings"],data=df)
data:image/s3,"s3://crabby-images/a6061/a60613f461ec284d07a8a007a87869b6ac12a5de" alt=""
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
alpha = 0.05
stats,p_value,degrees_of_freedom,expected = chi2_contingency(cross_tab)
if p_value > alpha:
print(f'Accept Null Hypothesis\n p_value is {p_value}\n Ringss are independent of Types')
else:
print(f'Reject Null Hypothesis\n p_value is {p_value}\n Rings are not independent of Types')
data:image/s3,"s3://crabby-images/ebd43/ebd436a11137a949e76f18a65a01f5b712832e1e" alt=""
참조
Reference
이 문제에 관하여(Chi Square Contingency를 사용한 탐색적 데이터 분석), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://dev.to/designegycreatives/exploratory-data-analysis-with-chi-square-contingency-11be텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념
(Collection and Share based on the CC Protocol.)