๐Ÿค– ๋”ฅ๋Ÿฌ๋‹ ์ฝ”๋“œ ์ •๋ฆฌ

28037 ๋‹จ์–ด ์ธ๊ณต์ง€๋ŠฅAIAI

1. ํŒ๋‹ค์Šค import

import pandas as pd

2. Matplotlib์˜ pyplot import

import matplotlib.pyplot as plt

3. ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์˜ ์•ž, ๋’ค ์กฐํšŒํ•˜๋Š” ๋ฒ•

data = DataFrame()

data.head() ## ์ฒ˜์Œ 5๊ฐœ์˜ ํ–‰์„ ๋ณด์—ฌ์คŒ.

data.tail() ## ๋งˆ์ง€๋ง‰ 5๊ฐœ์˜ ํ–‰์„ ๋ณด์—ฌ์คŒ.

4. ํžˆ์Šคํ† ๊ทธ๋žจ ๊ทธ๋ฆฌ๊ธฐ

petal = iris['petal.width']

plt.hist(petal, bins=5) ## 5๊ฐœ์˜ ๊ตฌ๊ฐ„์„ ๋‚˜๋ˆ ์„œ ๋ณด์—ฌ์ค€๋‹ค. 

5. ์‚ฐ์ ๋„ ์‹œ๊ฐํ™”

iris.plot.scatter(x='sepal.width', 

                  y='petal.width', 

                  s=100, # marker size

                  c='blue', 

                  alpha=0.5)

plt.title('Scatter Plot of iris by pandas', fontsize=20)

plt.xlabel('Petal Length', fontsize=14)

plt.ylabel('Petal Width', fontsize=14)

plt.show()

  • seaborn์„ ์ด์šฉํ•œ ์‚ฐ์ ๋„ ๊ทธ๋ฆฌ๊ธฐ

import seaborn as sns

sns.scatterplot(x="sepal.length", y="petal.length", hue="variety", s=100, data=iris)

=> hue๋Š” ๊ทธ๋ฃน์— ๋”ฐ๋ผ ์ƒ‰์ƒ์„ ๋‹ค๋ฅด๊ฒŒ ํ•ด์ฃผ๋Š” ์„ค์ • ๊ฐ’


6. Label Encoding ํ•˜๋Š” ๋ฒ•

from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()

iris['variety'] = encoder.fit_transform(iris['variety'])

7. ํŠธ๋ ˆ์ด๋‹ ์…‹๊ณผ ํ…Œ์ŠคํŠธ ์…‹ ๋ถ„๋ฆฌํ•˜๊ธฐ

from sklearn.model_selection import train_test_split

X= iris.loc[:, iris.columns!= "variety"]
Y= iris['variety']


x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size= 0.1, random_state = 2021, stratify = Y)

## train:test = 9:1 

print(x_train, x_test)

8. Random Forest ๋ชจ๋ธ ํ•™์Šตํ•˜๊ธฐ

  • ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ • : n_estimators=50, max_depth=13, random_state=30, min_samples_leaf=5
  • n_estimators ์ข…ํ•ฉํ•œ ์ „์ฒด ํŠธ๋ฆฌ์˜ ๊ฐ€์ง€์ˆ˜, max_depth : ๊ฐ Tree์˜ ๊ฐ€์žฅ ๊นŠ์€ ๋†’์ด, min_samples_leaf: ๊ฐ ๋์˜ ๋…ธ๋“œ์—๋Š” ์ตœ์†Œ 5๊ฐœ์˜ ํŠธ๋ ˆ์ด๋‹ ์ƒ˜ํ”Œ์ด ์žˆ์–ด์•ผํ•จ
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=50, max_depth=13, random_state=30, min_samples_leaf=5)

model.fit(x_train, y_train)

9. ๋‰ด๋Ÿด๋„คํŠธ์›Œํฌ ๋ชจ๋ธ ํ•™์Šตํ•˜๊ธฐ

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Input, Dense, BatchNormalization
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.callbacks import EarlyStopping
model = Sequential()
model.add(Dense(20, input_dim=6, activation='relu'))  ## ์ธํ’‹ ๋ ˆ์ด์–ด์˜ ํฌ๊ธฐ๋Š” 6
model.add(BatchNormalization())
model.add(Dense(20,  activation='relu'))
model.add(BatchNormalization())
model.add(Dense(3, activation='softmax'))

## 20๊ฐœ์˜ ๋…ธ๋“œ๋ฅผ ๊ฐ€์ง€๋Š” 2๊ฐœ์˜ ํžˆ๋“  ๋ ˆ์ด์–ด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ชจ๋ธ



model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

es = EarlyStopping(monitor= "val_loss", mode="min", verbose =1 , patience=5, restore_best_weights=True)

## ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ์ผ ๋•Œ ๊ฐ€์ค‘์น˜๋ฅผ ๋ณต๊ตฌํ•œ๋‹ค. 

history = model.fit(x_train, y_train, epochs=2000, validation_split=0.2, callbacks=[es])

10. Linear model ํ•™์Šตํ•˜๊ธฐ

from sklearn.linear_model import LinearRegression

linear_regression = LinearRegression()

m = linear_regression.fit(x_train,y_train["Lemon"])

print(m)

11. Decision Tree model ํ•™์Šตํ•˜๊ธฐ

  • ์ฐธ๊ณ ๋งํฌ

  • min_samples_leaf: ํŠธ๋ฆฌ์˜ ์ตœ์†Œ ์ƒ˜ํ”Œ ๊ฐฏ์ˆ˜

  • max_depth: ํŠธ๋ฆฌ์˜ ์ตœ๋Œ€ ๊นŠ์ด


from sklearn.tree import DecisionTreeClassifier

trees=[]

for i in range(1, 16):
    d_model = DecisionTreeClassifier(min_samples_leaf=10, max_depth=i, random_state=2021)
    d_model.fit(x_train, y_train)


12. column ์‚ญ์ œ(drop)ํ•˜๊ธฐ

df1 = df1.drop(columns=['voc_trt_reslt_itg_cd',
                       'oos_cause_type_itg_cd',
                       'engt_cperd_type_itg_cd',
                       'engt_tgt_div_itg_cd',
                       'fclt_oos_yn'
                      ])
## ์ง์ ‘ ์ง€์ •ํ•ด์„œ ์‚ญ์ œํ•  ์ˆ˜ ์žˆ์Œ.

f1 = df.drop(['voc_trt_reslt_itg_cd',
                'oos_cause_type_itg_cd',
                'engt_cperd_type_itg_cd',
               'engt_tgt_div_itg_cd',
               'fclt_oos_yn'
               ], axis=1)

## axis=1 ๋กœ ์„ค์ •ํ•ด์„œ ์—ด์„ ์ง€์šด๋‹ค.

13. ํŠน์ • ํƒ€์ž…์„ ๊ฐ€์ง„ ์ปฌ๋Ÿผ ์„ ํƒํ•˜๊ธฐ

cat_cols = df1.select_dtypes(include='object') 

14. One-Hot Encoder

  • get_dummies๋ฅผ ํ™œ์šฉํ•œ๋‹ค.
pd.get_dummies(cat_cols['cust_clas_itg_cd']).head()

15. ์ƒ๊ด€๊ด€๊ณ„ ํŒŒ์•…ํ•˜๊ธฐ

  • corr() ๋ฅผ ์จ์„œ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋‹ค.
df = DataFrame()
df.corr()

16. ํžˆํŠธ๋งต ์ถœ๋ ฅํ•˜๊ธฐ (seaborn์„ ์‚ฌ์šฉ)

sns.set(rc={'figure.figsize':(12,8)})sns.heatmap(corr, annot=True) ## annot ์„ค์ •์€ ์ˆซ์ž๋ฅผ ๋ณด์—ฌ์ค„ ์ง€์— ๋Œ€ํ•œ ์—ฌ๋ถ€

17. Check Point ์„ค์ •ํ•˜๊ธฐ

  • ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ์ข‹์€ ๋ชจ๋ธ์„ ํŒŒ์ผ๋กœ ์ €์žฅํ•œ๋‹ค.

input_size = x_train.shape[1]

model = Sequential()
model.add(Dense(200, input_shape=(input_size, ), activation="relu"))
model.add(BatchNormalization())
model.add(Dense(50, activation="relu"))
model.add(BatchNormalization())
model.add(Dense(25, activation="relu"))
model.add(BatchNormalization())
model.add(Dense(10, activation="relu"))
model.add(BatchNormalization())
model.add(Dense(5, activation="relu"))
model.add(BatchNormalization())
model.add(Dense(1))

model.compile(loss="mean_squared_error", metrics=['accuracy'], optimizer="adam")


mc = ModelCheckpoint('best_model.h5', monitor='val_accuracy', mode='max', save_best_only=True)

history = model.fit(x_train, y_train, epochs=200, validation_data=(x_valid, y_valid), callbacks=[mc])


18. ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์ธ๋ฑ์Šค ์ดˆ๊ธฐํ™”

  • drop ์„ค์ •์€ ์ธ๋ฑ์Šค๋กœ ์„ธํŒ…ํ•œ ์—ด์„ ์‚ญ์ œํ•  ์ง€์— ๋Œ€ํ•œ ์—ฌ๋ถ€ ๊ฒฐ์ •
  • inplace ๋Š” ํ˜„์žฌ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์›๋ณธ์— ์ ์šฉํ•  ์ง€ ๊ฒฐ์ •

x_train.reset_index(inplace=True, drop=True)

19. ๋‰ด๋Ÿด๋„คํŠธ์›Œํฌ ํ•™์Šต ๋กœ๊ทธ ์‹œ๊ฐํ™”


t_acc = history.history["accuracy"]
v_acc = history.history["val_accuracy"]
โ€‹
โ€‹
plt.plot(t_acc, c="red")
plt.plot(v_acc, c="blue")
โ€‹
plt.title("Accuracy")
โ€‹
plt.xlabel("epochs")
plt.ylabel("accuracy")
โ€‹

์ข‹์€ ์›นํŽ˜์ด์ง€ ์ฆ๊ฒจ์ฐพ๊ธฐ