■Kaggle Practice for Beginners -House SalePrice (PyCaret를 사용해 보았다)- by Google Colaboratory
9426 단어 파이썬colaboratoryKaggle
0. Introduction
I'd like to show how to use PyCaret thru House Sale Price Competition to introduce how easy to use this library.
This introduction is only to show very basic flow, so if you want to improve your score on Kaggle, you need to add some procedures, such as preprocessing steps and modeling techniques.
1. Setting for Google Colaboratory
Setup thru Google Colaboratory environment using these command below to activate connection with Kaggle.
from google.colab import files
files.upload()
!pip install kaggle
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 /root/.kaggle/kaggle.json
Copy and paste the API linked to datasets you want to download from Kaggle.
Don't forget putting "!"mark into front of this command.
# This is an example of House SalePrice
!kaggle competitions download -c house-prices-advanced-regression-techniques
2. Install PyCaret
!pip install pycaret
3. Import Dataset
import numpy as np
import pandas as pd
train = pd.read_csv("/content/train.csv")
test = pd.read_csv("/content/test.csv")
train.head()
4. Setup PyCaret
from pycaret.regression import *
reg = setup(train, target='SalePrice', session_id= 0)
session_id: int, default = None
If None, a random seed is generated and returned in the Information grid. The unique number is then distributed as a seed in all functions used during the experiment. This can be used for later reproducibility of the entire experiment.
5. Compare Models
Compare Models
compare_models(blacklist = None, fold = 10, round = 4, sort = ‘R2’, turbo = True)
This function uses all models in the model library and scores them using K-fold Cross Validation. The output prints a score grid that shows MAE, MSE, RMSE, R2, RMSLE and MAPE by fold (default CV = 10 Folds) of all the available models in model library.
['tr'] Thielsen Regressor required long training time, so I removed the method this time.
['lar'] Least Angle Regression showed too large amount of MAE, so I omitted this time.
compare_models(blacklist=['tr', 'lar'])
6.Create Models
llar1 = create_model('llar', verbose=False)
7. Predictions
predict_model(llar1)
predictions_llar1 = predict_model(llar1, data=test)
test_ID = test['Id']
predictions_llar_Label = predictions_llar1['Label']
my_submission = pd.DataFrame()
my_submission["Id"] = test_ID
my_submission["SalePrice"] = predictions_llar_Label
my_submission.to_csv('submission_llar.csv', index=False)
8. Submission
Copy and paste the API linked to submission page you want to submit to Kaggle.
Don't forget putting "!"mark into front of this command.
# Submission to Kaggle
!kaggle competitions submit -c house-prices-advanced-regression-techniques -f submission_llar.csv -m "Message"
References:
LINKS
Setup thru Google Colaboratory environment using these command below to activate connection with Kaggle.
from google.colab import files
files.upload()
!pip install kaggle
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 /root/.kaggle/kaggle.json
Copy and paste the API linked to datasets you want to download from Kaggle.
Don't forget putting "!"mark into front of this command.
# This is an example of House SalePrice
!kaggle competitions download -c house-prices-advanced-regression-techniques
2. Install PyCaret
!pip install pycaret
3. Import Dataset
import numpy as np
import pandas as pd
train = pd.read_csv("/content/train.csv")
test = pd.read_csv("/content/test.csv")
train.head()
4. Setup PyCaret
from pycaret.regression import *
reg = setup(train, target='SalePrice', session_id= 0)
session_id: int, default = None
If None, a random seed is generated and returned in the Information grid. The unique number is then distributed as a seed in all functions used during the experiment. This can be used for later reproducibility of the entire experiment.
5. Compare Models
Compare Models
compare_models(blacklist = None, fold = 10, round = 4, sort = ‘R2’, turbo = True)
This function uses all models in the model library and scores them using K-fold Cross Validation. The output prints a score grid that shows MAE, MSE, RMSE, R2, RMSLE and MAPE by fold (default CV = 10 Folds) of all the available models in model library.
['tr'] Thielsen Regressor required long training time, so I removed the method this time.
['lar'] Least Angle Regression showed too large amount of MAE, so I omitted this time.
compare_models(blacklist=['tr', 'lar'])
6.Create Models
llar1 = create_model('llar', verbose=False)
7. Predictions
predict_model(llar1)
predictions_llar1 = predict_model(llar1, data=test)
test_ID = test['Id']
predictions_llar_Label = predictions_llar1['Label']
my_submission = pd.DataFrame()
my_submission["Id"] = test_ID
my_submission["SalePrice"] = predictions_llar_Label
my_submission.to_csv('submission_llar.csv', index=False)
8. Submission
Copy and paste the API linked to submission page you want to submit to Kaggle.
Don't forget putting "!"mark into front of this command.
# Submission to Kaggle
!kaggle competitions submit -c house-prices-advanced-regression-techniques -f submission_llar.csv -m "Message"
References:
LINKS
!pip install pycaret
import numpy as np
import pandas as pd
train = pd.read_csv("/content/train.csv")
test = pd.read_csv("/content/test.csv")
train.head()
4. Setup PyCaret
from pycaret.regression import *
reg = setup(train, target='SalePrice', session_id= 0)
session_id: int, default = None
If None, a random seed is generated and returned in the Information grid. The unique number is then distributed as a seed in all functions used during the experiment. This can be used for later reproducibility of the entire experiment.
5. Compare Models
Compare Models
compare_models(blacklist = None, fold = 10, round = 4, sort = ‘R2’, turbo = True)
This function uses all models in the model library and scores them using K-fold Cross Validation. The output prints a score grid that shows MAE, MSE, RMSE, R2, RMSLE and MAPE by fold (default CV = 10 Folds) of all the available models in model library.
['tr'] Thielsen Regressor required long training time, so I removed the method this time.
['lar'] Least Angle Regression showed too large amount of MAE, so I omitted this time.
compare_models(blacklist=['tr', 'lar'])
6.Create Models
llar1 = create_model('llar', verbose=False)
7. Predictions
predict_model(llar1)
predictions_llar1 = predict_model(llar1, data=test)
test_ID = test['Id']
predictions_llar_Label = predictions_llar1['Label']
my_submission = pd.DataFrame()
my_submission["Id"] = test_ID
my_submission["SalePrice"] = predictions_llar_Label
my_submission.to_csv('submission_llar.csv', index=False)
8. Submission
Copy and paste the API linked to submission page you want to submit to Kaggle.
Don't forget putting "!"mark into front of this command.
# Submission to Kaggle
!kaggle competitions submit -c house-prices-advanced-regression-techniques -f submission_llar.csv -m "Message"
References:
LINKS
from pycaret.regression import *
reg = setup(train, target='SalePrice', session_id= 0)
Compare Models
compare_models(blacklist = None, fold = 10, round = 4, sort = ‘R2’, turbo = True)
This function uses all models in the model library and scores them using K-fold Cross Validation. The output prints a score grid that shows MAE, MSE, RMSE, R2, RMSLE and MAPE by fold (default CV = 10 Folds) of all the available models in model library.
['tr'] Thielsen Regressor required long training time, so I removed the method this time.
['lar'] Least Angle Regression showed too large amount of MAE, so I omitted this time.
compare_models(blacklist=['tr', 'lar'])
6.Create Models
llar1 = create_model('llar', verbose=False)
7. Predictions
predict_model(llar1)
predictions_llar1 = predict_model(llar1, data=test)
test_ID = test['Id']
predictions_llar_Label = predictions_llar1['Label']
my_submission = pd.DataFrame()
my_submission["Id"] = test_ID
my_submission["SalePrice"] = predictions_llar_Label
my_submission.to_csv('submission_llar.csv', index=False)
8. Submission
Copy and paste the API linked to submission page you want to submit to Kaggle.
Don't forget putting "!"mark into front of this command.
# Submission to Kaggle
!kaggle competitions submit -c house-prices-advanced-regression-techniques -f submission_llar.csv -m "Message"
References:
LINKS
llar1 = create_model('llar', verbose=False)
predict_model(llar1)
predictions_llar1 = predict_model(llar1, data=test)
test_ID = test['Id']
predictions_llar_Label = predictions_llar1['Label']
my_submission = pd.DataFrame()
my_submission["Id"] = test_ID
my_submission["SalePrice"] = predictions_llar_Label
my_submission.to_csv('submission_llar.csv', index=False)
8. Submission
Copy and paste the API linked to submission page you want to submit to Kaggle.
Don't forget putting "!"mark into front of this command.
# Submission to Kaggle
!kaggle competitions submit -c house-prices-advanced-regression-techniques -f submission_llar.csv -m "Message"
References:
LINKS
# Submission to Kaggle
!kaggle competitions submit -c house-prices-advanced-regression-techniques -f submission_llar.csv -m "Message"
Reference
이 문제에 관하여(■Kaggle Practice for Beginners -House SalePrice (PyCaret를 사용해 보았다)- by Google Colaboratory), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/O-Mik/items/6cdada83cfe17ac6efae텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)