Porto Seguro Exploratory Analysis and Prediction_prepare the model
새롭게 알게 된 사실
Stacking model이란 방식을 통해서 모델을 쌓으면 성능 향상을 이룩할 수 있다 왜냐하면, 좋은 것을 여러 개 모았기 때문입니다. 단, 연산량이 많아지는 건 주의해야합니다.
참고: 링크텍스트
*Code
#initialize the ensambing object :Very interesting point to me
stack = Ensemble(n_splits=3,
stacker = log_model,
base_models = (lgb_model1, lgb_model2, lgb_model3, xgb_model))
Prepare the model
Ensable class for validation and ensamble
Spliit data in KFolds
-
train the models
-
ensemble the results
-
init method paraneters
-
self: the object to be initialized
-
n_splits: the number of cross-validation splits to be used
-
stack: the model used for stacking the prediction results from the trained base models
-
base_models: the list of base models used in training
-
-
fit_predict four functions
-
split the training data in n_splits folds:
-
run the base models
-
perform prediction using each model;
-
ensamble the resuls using the stacker;
-
Code
class Ensemble(object):
def __init__(self, n_splits, stacker, base_models):
self.n_splits = n_splits
self.stacker = stacker
self.base_models = base_models
def fit_predict(self, X, y, T):
X = np.array(X)
y = np.array(y)
T = np.array(T)
folds = list(StratifiedKFold(n_splits=self.n_splits, shuffle=True, random_state=314).split(X,y))
S_train = np.zeros((X.shape[0], len(self.base_models)))
S_test = np.zeros((T.shape[0], len(self.base_models)))
for i, clf in enumerate(self.base_models):
S_test_i = np.zeros((T.shape[0], self.n_splits))
for j, (train_idx, test_idx) in enumerate(folds):
X_train = X[train_idx]
y_train = y[train_idx]
X_holdout = X[test_idx]
print("Base model %d: fit %s model | fold %d" % (i+1, str(clf).split('(')[0], j+1))
clf.fit(X_train, y_train)
cross_score = cross_val_score(clf, X_train, y_train, cv=3, scoring='roc_auc')
print("cross_score[roc-auc]: %.5f [gini]: %.5f" % (cross_score.mean(), 2*cross_score.mean()-1))
y_pred = clf.predict_proba(X_holdout)[:,1]
S_train[test_idx, i] = y_pred
S_test_i[:, j] = clf.predict_proba(T)[:,1]
S_test[:, i] = S_test_i.mean(axis=1)
results = cross_val_score(self.stacker, S_train, y, cv=3, scoring='roc_auc')
#Calculate gini factor as 2 * AUC -1
print("Stacker score[gini]: %.5f" % (2 * results.mean() -1))
self.stacker.fit(S_train, y)
res = self.stacker.predict_proba(S_test)[:,1]
return res
Stacking models을 위한 작업
Parameters for the base models
Three different LightGBM and XGB model
train data : well cross-validation with 3 folds
#LightGBM params
#lgb_1
lgb_params1 = {}
lgb_params1['learning_rate'] = 0.02
lgb_params1['n_estimators'] = 650
lgb_params1['max_bin'] = 10
lgb_params1['subsample'] = 0.8
lgb_params1['subsample_freq'] = 10
lgb_params1['colsample_bytree'] = 0.8
lgb_params1['min_child_samples'] = 500
lgb_params1['seed'] = 314
lgb_params1['num_threads'] = 4
#lgb_2
lgb_params2 = {}
lgb_params2['n_estimators'] = 1000
lgb_params2['learning_rate'] = 0.02
lgb_params2['colsample_bytree'] = 0.3
lgb_params2['subsample'] = 0.7
lgb_params2['subsample_freq'] = 2
lgb_params2['num_leaves'] = 16
lgb_params2['seed'] = 314
lgb_params2['num_threads'] = 4
#lgb_3
lgb_params3 = {}
lgb_params3['n_estimators'] = 100
lgb_params3['max_depth'] = 4
lgb_params3['learning_rate'] = 0.02
lgb_params3['seed'] = 314
lgb_params3['num_threads'] = 4
#XGBoost params
xgb_params = {}
xgb_params['objective'] = 'binary:logistic'
xgb_params['learning_rate'] = 0.04
xgb_params['n_estimators'] = 490
xgb_params['max_depth'] = 4
xgb_params['subsample'] = 0.9
xgb_params['colsample_bytree'] = 0.9
xgb_params['min_child_weight'] = 10
xgb_params['num_threads'] = 4
#Initialize the models with parameters
##3base models and the stacking model
import xgboost
from xgboost import XGBClassifier
# Base models
lgb_model1 = LGBMClassifier(**lgb_params1)
lgb_model2 = LGBMClassifier(**lgb_params2)
lgb_model3 = LGBMClassifier(**lgb_params3)
xgb_model = XGBClassifier(**xgb_params)
# Stacking model
log_model = LogisticRegression()
#Run the predictive models
#fit_predict method of stack object
# predict the target with each model
#Ensamble the results using the stacker model and output the stacked result
y_prediction = stack.fit_predict(trainset, target_train, testset)
Author And Source
이 문제에 관하여(Porto Seguro Exploratory Analysis and Prediction_prepare the model), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@qsdcfd/Porto-Seguro-Exploratory-Analysis-and-Predictionprepare-the-model저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)