Introduction_Home_Credit_Default_Risk_Competition_load_data
Home Credit Default
*Goal
The historical loan application is used data to predict probability of replaying a loan
*Supervised classification task
Data(Home Credit)
-
application_train/application_test
-
main data: each loan application
-
Every loan: SK_ID_CURR
-
Target : 0 or 1
-
-
bureau
- Multiple previous of client credits
-
bureau_balance
-
monthly data of previous credits
-
rows
-
-
previous_application
- previous loans data
- feature: SK_ID_PREV
-
POS_CASH_BALANCE
- monthly previous of sale or cash loan
-
credit_card_bbalance
-
monthly data of credit cards clients
-
single cards in many rows
-
-
installments_payment
-
payment history for previous loans
-
made payment and missed payment
-
Metric
ROC AUC
-
ROC: True positive rate versus the false positive rate
-
AUC: the area under the ROC curve
-
ROC AUC
-
have probability between 0 and 1
-
represent a better model performance
-
Code
# imports library
## numpy and pandas for data manipulation
import numpy as np
import pandas as pd
#sklearn preprocessing for dealing with categorical variables
from sklearn.preprocessing import LabelEncoder
#File system manangment
import os
#Suppress warnings
import warnings
warnings.filterwarnings('ignore')
#matplotlib and seaborn for plotting
import matplotlib.pyplot as plt
import seaborn as sns
#connect drive
from google.colab import drive
drive.mount('/content/gdrive/')
#Training data
app_train = pd.read_csv('./gdrive/MyDrive//home-credit-default-risk/application_train.csv')
print('Train data shape:', app_train.shape) :
app_train.head(10)
SK_ID_CURR TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT AMT_ANNUITY ... FLAG_DOCUMENT_18 FLAG_DOCUMENT_19 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_YEAR
0 100002 1 Cash loans M N Y 0 202500.0 406597.5 24700.5 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0 1.0
1 100003 0 Cash loans F N N 0 270000.0 1293502.5 35698.5 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0
2 100004 0 Revolving loans M Y Y 0 67500.0 135000.0 6750.0 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0
3 100006 0 Cash loans F N Y 0 135000.0 312682.5 29686.5 ... 0 0 0 0 NaN NaN NaN NaN NaN NaN
4 100007 0 Cash loans M N Y 0 121500.0 513000.0 21865.5 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0
5 100008 0 Cash loans M N Y 0 99000.0 490495.5 27517.5 ... 0 0 0 0 0.0 0.0 0.0 0.0 1.0 1.0
6 100009 0 Cash loans F Y Y 1 171000.0 1560726.0 41301.0 ... 0 0 0 0 0.0 0.0 0.0 1.0 1.0 2.0
7 100010 0 Cash loans M Y Y 0 360000.0 1530000.0 42075.0 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0
8 100011 0 Cash loans F N Y 0 112500.0 1019610.0 33826.5 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0 1.0
9 100012 0 Revolving loans M N Y 0 135000.0 405000.0 20250.0 ... 0 0 0 0 NaN NaN NaN NaN NaN NaN
10 rows × 122 columns
#Test data : target
app_test = pd.read_csv('./gdrive/MyDrive//home-credit-default-risk/application_test.csv')
print('Train data shape:', app_test.shape)
app_test.head(10)
SK_ID_CURR NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT AMT_ANNUITY AMT_GOODS_PRICE ... FLAG_DOCUMENT_18 FLAG_DOCUMENT_19 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_YEAR
0 100001 Cash loans F N Y 0 135000.0 568800.0 20560.5 450000.0 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0
1 100005 Cash loans M N Y 0 99000.0 222768.0 17370.0 180000.0 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0 3.0
2 100013 Cash loans M Y Y 0 202500.0 663264.0 69777.0 630000.0 ... 0 0 0 0 0.0 0.0 0.0 0.0 1.0 4.0
3 100028 Cash loans F N Y 2 315000.0 1575000.0 49018.5 1575000.0 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0 3.0
4 100038 Cash loans M Y N 1 180000.0 625500.0 32067.0 625500.0 ... 0 0 0 0 NaN NaN NaN NaN NaN NaN
5 100042 Cash loans F Y Y 0 270000.0 959688.0 34600.5 810000.0 ... 0 0 0 0 0.0 0.0 0.0 0.0 1.0 2.0
6 100057 Cash loans M Y Y 2 180000.0 499221.0 22117.5 373500.0 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0 1.0
7 100065 Cash loans M N Y 0 166500.0 180000.0 14220.0 180000.0 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0 2.0
8 100066 Cash loans F N Y 0 315000.0 364896.0 28957.5 315000.0 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0 5.0
9 100067 Cash loans F Y Y 1 162000.0 45000.0 5337.0 45000.0 ... 0 0 0 0 0.0 0.0 0.0 0.0 0.0
Author And Source
이 문제에 관하여(Introduction_Home_Credit_Default_Risk_Competition_load_data), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@qsdcfd/IntroductionHomeCreditDefaultRiskCompetitionloaddata저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)