python๐Ÿ๐Ÿผpandas ์ดˆ๋ณด์ž ๊ฐ€์ด๋“œ

ํŒ๋‹ค๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

Python pandas๋Š” ๋ฐ์ดํ„ฐ ๋ถ„์„์— ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ์˜คํ”ˆ ์†Œ์Šค ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ž…๋‹ˆ๋‹ค.
Pandas ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ML ๋ฐ ๋ฐ์ดํ„ฐ ๊ณผํ•™์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๊ณ  ์กฐ์ž‘ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

pip install pandas


์‹œ์Šคํ…œ์— pandas๋ฅผ ์„ค์น˜ํ•˜๋Š” pip ๋ช…๋ น.

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์ด๋ž€?

pandas DataFrame์€ 2์ฐจ์› ๋ฐ์ดํ„ฐ ๋ฐฐ์—ด ๋˜๋Š” ํ–‰๊ณผ ์—ด์ด ์žˆ๋Š” ํ…Œ์ด๋ธ”์ž…๋‹ˆ๋‹ค.

ํŒฌ๋”์—์„œ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ๋งŒ๋“ค๊ธฐ:

import pandas as pd
car_dataset = {
'cars': ['Tata', 'Maruti', 'Tesla'], 'Model': ['Nano', 'i10', '11x3'], 'Range: [300, 315, 400]
}
car_df = pd.DataFrame(car_dataset)
print(car_df)


๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ๊ธฐ๋ณธ ์—ด ์ž‘์—…
๋Œ€๊ด„ํ˜ธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์—ด์— ์‰ฝ๊ฒŒ ์•ก์„ธ์Šคํ•˜๊ณ  ๊ฐ’์„ ํ• ๋‹นํ•˜๊ฑฐ๋‚˜ ์—…๋ฐ์ดํŠธํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค์Œ์€ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์—ด์—์„œ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋ช‡ ๊ฐ€์ง€ ๊ธฐ๋ณธ ์ž‘์—…์ž…๋‹ˆ๋‹ค.

#Accessing Single Column
print(car_df[['cars']])
# you can also use single square brackets to access single column
#Accessing Multiple Column
print(car_df [[ 'Model', 'Range']])
# Add New Column
car_df['new_column_name'] = [1, 2, 3] # new column value
# Delete Column
car_df.drop(columns=['new_col_name'], inplace=True)
# rename column
#Syntax: df.renamel columns={"oldName":"NewName"}, inplace=True)
car_df.rename(columns={ 'Model' : 'model'}, inplace=True)


CSV ํŒŒ์ผ ์ฝ๊ธฐ:

๋น… ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ €์žฅํ•˜๋Š” ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ CSV ํŒŒ์ผ(์‰ผํ‘œ๋กœ ๊ตฌ๋ถ„๋œ ํŒŒ์ผ)์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
CSV ํŒŒ์ผ์€ ๊ธฐ๊ณ„ ํ•™์Šต ๋˜๋Š” ๋ฐ์ดํ„ฐ ๊ณผํ•™์—์„œ ์ž‘์—…ํ•˜๋Š” ๋™์•ˆ ์‚ฌ์šฉํ•  ์ผ๋ฐ˜์ ์ธ ํŒŒ์ผ ์œ ํ˜•์ž…๋‹ˆ๋‹ค.

import pandas as pd
df = pd.read_csv('Housing.csv') print(df)
# print(df.to_string())
# use to_string() to print the entire DataFrame.


๋ฐ์ดํ„ฐ ์‚ดํŽด๋ณด๊ธฐ:

๋ฐ์ดํ„ฐ์˜ ๋†’์€ ์ˆ˜์ค€์˜ ๊ฐœ์š”๋ฅผ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด pandas๋Š” ์—ฌ๋Ÿฌ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋ฉฐ ๊ทธ ์ค‘ ์ผ๋ถ€๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

import pandas as pd
 Read CSV File
df = pd.read_csv('Housing.csv')
#head of the data
print(df.head(10)) print first 19 rows of dataframe
#tall of the data
print(df.tail(10)) print last 10 rows of dataframe
#shape = To know the dimensions of the data print(df.shape)
#(545, 19) 11's means 545 rows and 13 columns
#Features
print(df.columns) # it return the columns name
#Index("price", "area", "bedrooms bathrooms, stories", "matnroad"
#guestroom", "basement, hotwaterheating', 'airconditioning,
#parking prefarea", furnishingstatus ], dtype="object")
#info
print(df.info())
prints info about the null values and the data types of each cols.



Pandas๋ฅผ ์‚ฌ์šฉํ•œ ํ†ต๊ณ„ ๋ถ„์„:
Pandas๋Š” ๋ฐ์ดํ„ฐ์—์„œ ๋” ๊นŠ์ด ํŒŒ๊ณ ๋“ค๊ณ  ๋” ์œ ์šฉํ•œ ํ†ต์ฐฐ๋ ฅ์„ ์ฐพ๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” ๋ช‡ ๊ฐ€์ง€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋ฉฐ ์œ ์šฉํ•œ ๊ธฐ๋Šฅ ์ค‘ ์ผ๋ถ€๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

# describe : returns statistical measures such as min and max values, mean, standard deviation and more.
df.describe()
# unique : return all the unique values in column.
df['columnName'].unique()
#value_count : returns the frequency of the values df['columnName'].value_counts()
# correlation : find the correlation among the features respectively.
df.corr()


Pandas์—๋Š” ํ‰๊ท , ์ค‘์•™๊ฐ’ ๋ฐ ๋ชจ๋“œ ๋“ฑ๊ณผ ๊ฐ™์€ ๋‹ค๋ฅธ ํ†ต๊ณ„์  ์ฒ™๋„๋ฅผ ์ฐพ๋Š” ๊ธฐ๋Šฅ๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

์ข‹์€ ์›นํŽ˜์ด์ง€ ์ฆ๊ฒจ์ฐพ๊ธฐ