PokerHand 데이터 세트의 XGboost 의사록

목표

잊어버릴 것 같으니 우선 이동을 목적으로

XGboost는

GBDT(Granding Enhanced Tree) 알고리즘 중 하나를 사용합니다.
논문은 여기에 있다

해설 아래의 Qita도 이해하기 쉽다.감사

실천하다

데이터 준비

데이터 세트는 현재 디렉토리의 전제 조건입니다.
다운로드 방법은 여기서부터 시작하세요.

import pandas as pd
import io
train = pd.read_csv('./poker-hand-training-true.csv', header=None)
test = pd.read_csv('./poker-hand-testing.csv', header=None)

프리 프로세싱

sklearn을 미리 처리하는 모듈preprocessing

from sklearn import preprocessing
from sklearn.model_selection import train_test_split

train_Y = train[:][10]
test_Y = test[:][10]

X_train, X_test, = train_test_split(train,train_size=0.7)
Y_train,Y_test = train_test_split(train_Y,train_size=0.7)
print(X_train.shape,Y_train.shape,X_test.shape,Y_test.shape)

X_train.drop(10,axis=1,inplace=True)
X_test.drop(10,axis=1,inplace=True)
print(X_train.shape,Y_train.shape,X_test.shape,Y_test.shape)

(17507, 11) (17507,) (7503, 11) (7503,)
(17507, 10) (17507,) (7503, 10) (7503,)

검증 데이터 체크 아웃

정밀도를 높이기 위해 flat 설정

import numpy as np
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
Y_train = Y_train.astype('float32')
Y_test = Y_test.astype('float32')

X_train, X_val,= train_test_split(X_train,train_size=0.7)
Y_train, Y_val = train_test_split(Y_train,train_size=0.7)
print(X_train.shape,Y_train.shape,X_val.shape,Y_val.shape)

(12254, 10) (12254,) (5253, 10) (5253,)

모델 정의

import xgboost as xgb
from sklearn.model_selection import GridSearchCV

clf = xgb.XGBClassifier()
clf_cv = GridSearchCV(clf, {'max_depth': [2,4,6], 'n_estimators': [50,100,200]}, verbose=1)

매개변수 검색

학습 전에 최적 인자 선택

clf_cv.fit(X_train, Y_train,verbose=1)

배우다

최적 매개 변수로 학습 회전

clf = xgb.XGBClassifier(**clf_cv.best_params_)
clf.fit(X_train, Y_train,
        early_stopping_rounds=100,
        eval_set=[[X_val, Y_val]],
        verbose=1)

모델 평가

from sklearn.metrics import confusion_matrix,classification_report
pred = clf.predict(X_test)
print (confusion_matrix(Y_test, pred))
print (classification_report(Y_test, pred))

[[3729   10    0    0    0    0    0    0    0    0]
 [3201    6    0    0    0    0    0    0    0    0]
 [ 343    1    0    0    0    0    0    0    0    0]
 [ 146    0    0    0    0    0    0    0    0    0]
 [  33    0    0    0    0    0    0    0    0    0]
 [  19    0    0    0    0    0    0    0    0    0]
 [  11    0    0    0    0    0    0    0    0    0]
 [   2    0    0    0    0    0    0    0    0    0]
 [   1    0    0    0    0    0    0    0    0    0]
 [   1    0    0    0    0    0    0    0    0    0]]
              precision    recall  f1-score   support

         0.0       0.50      1.00      0.66      3739
         1.0       0.35      0.00      0.00      3207
         2.0       0.00      0.00      0.00       344
         3.0       0.00      0.00      0.00       146
         4.0       0.00      0.00      0.00        33
         5.0       0.00      0.00      0.00        19
         6.0       0.00      0.00      0.00        11
         7.0       0.00      0.00      0.00         2
         8.0       0.00      0.00      0.00         1
         9.0       0.00      0.00      0.00         1

    accuracy                           0.50      7503
   macro avg       0.09      0.10      0.07      7503
weighted avg       0.40      0.50      0.33      7503

아직 멀었어.XGBoost를 먼저 이동하는 것이 목적이기 때문에 알고리즘 선택에 적합하지 않을 수도 있다.

Reference

이 문제에 관하여(PokerHand 데이터 세트의 XGboost 의사록), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://zenn.dev/yassh_i/articles/299e7b8d2c6dd3

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다