XGBoost를 Optuna로 파라미터 튜닝하기

TL;DR

XGBoost 의 파라미터를 Optuna 로 튜닝합니다.
벤치마크 데이터는 보스턴 주택 가격 데이터 세트을 사용합니다.

데이터 준비

scikit-learn 의 datasets 를 사용하여 데이터를 로드합니다.
학습 데이터와 테스트 데이터의 분할은 8:2입니다.

from sklearn import datasets

features, labels = datasets.load_boston(return_X_y =True)

from sklearn import model_selection

train_features, test_features, train_labels, test_labels = model_selection.train_test_split(features, labels, test_size=0.2)

print(train_features.shape)
print(train_labels.shape)
print(test_features.shape)
print(test_labels.shape)

> (404, 13)
> (404,)
> (102, 13)
> (102,)

import xgboost as xgb

trains = xgb.DMatrix(train_features, label=train_labels)
tests = xgb.DMatrix(test_features, label=test_labels)

하이퍼파라미터 최적화

Optuna에서 파라미터 튜닝을 수행합니다. 튜닝 대상은 다음과 같습니다.

eta ... 학습률

max_depth ... 나무 깊이

lambda ... L2 정규화 항의 패널티

목적은 R2(결정 계수)를 사용합니다. R2는 큰 쪽이 성능이 높은 것을 나타내기 때문에 direction 를 maximize 로 하고 있습니다.

base_params = {
    'booster': 'gbtree',
    'objective': 'reg:squarederror',
    'eval_metric': 'rmse',
}

watchlist = [(trains, 'train'), (tests, 'eval')]

import optuna
from sklearn.metrics import r2_score
import copy

tmp_params = copy.deepcopy(base_params)

def optimizer(trial):
#     booster = trial.suggest_categorical('booster', ['gbtree', 'dart', 'gblinear'])
    eta = trial.suggest_uniform('eta', 0.01, 0.3)
    max_depth = trial.suggest_int('max_depth', 4, 15)
    __lambda = trial.suggest_uniform('lambda', 0.7, 2)

#     params['booster'] = booster
    tmp_params['eta'] = eta
    tmp_params['max_depth'] = max_depth
    tmp_params['lambda'] = __lambda

    model = xgb.train(tmp_params, trains, num_boost_round=50)
    predicts = model.predict(tests)

    r2 = r2_score(test_labels, predicts)
    print(f'#{trial.number}, Result: {r2}, {trial.params}')

    return r2

study = optuna.create_study(direction='maximize')
study.optimize(optimizer, n_trials=500)

> #0, Result: 0.9153797234954849, {'eta': 0.21541259325117842, 'max_depth': 4, 'lambda': 1.7243766588775653}
> [I 2019-12-14 23:49:43,636] Finished trial#0 resulted in value: 0.9153797234954849. Current best value is 0.9153797234954849 with parameters: {'eta': 0.21541259325117842, 'max_depth': 4, 'lambda': 1.7243766588775653}.
> #1, Result: 0.9277796354008809, {'eta': 0.1678675361241897, 'max_depth': 7, 'lambda': 1.9228108973855251}
> [I 2019-12-14 23:49:43,734] Finished trial#1 resulted in value: 0.9277796354008809. Current best value is 0.9277796354008809 with parameters: {'eta': 0.1678675361241897, 'max_depth': 7, 'lambda': 1.9228108973855251}.
> #2, Result: 0.8903499007997161, {'eta': 0.07375873958103377, 'max_depth': 13, 'lambda': 1.841310013076201}
> [I 2019-12-14 23:49:43,856] Finished trial#2 resulted in value: 0.8903499007997161. Current best value is 0.9277796354008809 with parameters: {'eta': 0.1678675361241897, 'max_depth': 7, 'lambda': 1.9228108973855251}.
[省略]
> #499, Result: 0.9350409121311861, {'eta': 0.146374389194902, 'max_depth': 8, 'lambda': 1.731254194217149}
> [I 2019-12-14 23:51:08,655] Finished trial#499 resulted in value: 0.9350409121311861. Current best value is 0.9477310818026083 with parameters: {'eta': 0.16519267749243557, 'max_depth': 7, 'lambda': 1.72021507963037}.

다음이 탐색된 가운데 베스트의 파라미터입니다.

study.best_params

> {'eta': 0.16519267749243557, 'max_depth': 7, 'lambda': 1.72021507963037}

다음은 Seaborn의 Pairplot에 표시된 매개 변수 간의 상관 관계입니다.

%matplotlib inline
import seaborn as sns

study_df = study.trials_dataframe()[['value', 'params']]
sns.pairplot(study_df, kind='reg')

찾은 매개 변수로 모델링

최적화된 매개변수를 사용하여 학습합니다.

from sklearn.metrics import r2_score

def eval_model(params, trains, tests):
    model = xgb.train(params, trains, num_boost_round=100, verbose_eval=False, evals=watchlist)
    predicts = model.predict(tests)
    r2 = r2_score(test_labels, predicts)

    return r2

base_r2 = eval_model(base_params, trains, tests)

merged_params = dict(base_params, **study.best_params)
best_r2 = eval_model(merged_params, trains, tests)

print(f'Base params: {base_params}')
print(f'Best params: {merged_params}')
print(f'Base: {base_r2}, Best: {best_r2}, Diff: {best_r2 - base_r2}')

> Base params: {'booster': 'gbtree', 'objective': 'reg:squarederror', 'eval_metric': 'rmse'}
> Best params: {'booster': 'gbtree', 'objective': 'reg:squarederror', 'eval_metric': 'rmse', 'eta': 0.16519267749243557, 'max_depth': 7, 'lambda': 1.72021507963037}
> Base: 0.8937800621867638, Best: 0.94643405613549, Diff: 0.052653993948726274

요약

Optuna는 하이퍼파라미터를 탐색하여 다음과 같이 개선할 수 있었습니다.

기본값: 0.89

최적화 후: 0.95

개선된 값: 0.05

참고문헌

XGBoost

Optuna

Reference

이 문제에 관하여(XGBoost를 Optuna로 파라미터 튜닝하기), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/hideki/items/7ef7fc8aacce049cce77

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

XGBoost를 Optuna로 파라미터 튜닝하기

XGBoost를 Optuna로 파라미터 튜닝하기

TL;DR

데이터 준비

하이퍼파라미터 최적화

찾은 매개 변수로 모델링

요약

참고문헌

Reference

좋은 웹페이지 즐겨찾기