Introduction_Home_Credit_Default_Risk_Competition_load_data

5258 단어 kagglekaggle

Home Credit Default

*Goal

The historical loan application is used data to predict probability of replaying a loan

*Supervised classification task


Data(Home Credit)

  • application_train/application_test

    • main data: each loan application

    • Every loan: SK_ID_CURR

    • Target : 0 or 1

  • bureau

    • Multiple previous of client credits
  • bureau_balance

    • monthly data of previous credits

    • rows

  • previous_application

    • previous loans data
    • feature: SK_ID_PREV
  • POS_CASH_BALANCE

    • monthly previous of sale or cash loan
  • credit_card_bbalance

    • monthly data of credit cards clients

    • single cards in many rows

  • installments_payment

    • payment history for previous loans

    • made payment and missed payment


Metric

ROC AUC

  • ROC: True positive rate versus the false positive rate

  • AUC: the area under the ROC curve

  • ROC AUC

    • have probability between 0 and 1

    • represent a better model performance


Code

# imports library
## numpy and pandas for data manipulation

import numpy as np
import pandas as pd

#sklearn preprocessing for dealing with categorical variables
from sklearn.preprocessing import LabelEncoder

#File system manangment

import os

#Suppress warnings

import warnings
warnings.filterwarnings('ignore')

#matplotlib and seaborn for plotting

import matplotlib.pyplot as plt
import seaborn as sns
#connect drive

from google.colab import drive 
drive.mount('/content/gdrive/')
#Training data

app_train = pd.read_csv('./gdrive/MyDrive//home-credit-default-risk/application_train.csv')
print('Train data shape:', app_train.shape)  :
app_train.head(10)
	SK_ID_CURR	TARGET	NAME_CONTRACT_TYPE	CODE_GENDER	FLAG_OWN_CAR	FLAG_OWN_REALTY	CNT_CHILDREN	AMT_INCOME_TOTAL	AMT_CREDIT	AMT_ANNUITY	...	FLAG_DOCUMENT_18	FLAG_DOCUMENT_19	FLAG_DOCUMENT_20	FLAG_DOCUMENT_21	AMT_REQ_CREDIT_BUREAU_HOUR	AMT_REQ_CREDIT_BUREAU_DAY	AMT_REQ_CREDIT_BUREAU_WEEK	AMT_REQ_CREDIT_BUREAU_MON	AMT_REQ_CREDIT_BUREAU_QRT	AMT_REQ_CREDIT_BUREAU_YEAR
0	100002	1	Cash loans	M	N	Y	0	202500.0	406597.5	24700.5	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	1.0
1	100003	0	Cash loans	F	N	N	0	270000.0	1293502.5	35698.5	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	0.0
2	100004	0	Revolving loans	M	Y	Y	0	67500.0	135000.0	6750.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	0.0
3	100006	0	Cash loans	F	N	Y	0	135000.0	312682.5	29686.5	...	0	0	0	0	NaN	NaN	NaN	NaN	NaN	NaN
4	100007	0	Cash loans	M	N	Y	0	121500.0	513000.0	21865.5	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	0.0
5	100008	0	Cash loans	M	N	Y	0	99000.0	490495.5	27517.5	...	0	0	0	0	0.0	0.0	0.0	0.0	1.0	1.0
6	100009	0	Cash loans	F	Y	Y	1	171000.0	1560726.0	41301.0	...	0	0	0	0	0.0	0.0	0.0	1.0	1.0	2.0
7	100010	0	Cash loans	M	Y	Y	0	360000.0	1530000.0	42075.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	0.0
8	100011	0	Cash loans	F	N	Y	0	112500.0	1019610.0	33826.5	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	1.0
9	100012	0	Revolving loans	M	N	Y	0	135000.0	405000.0	20250.0	...	0	0	0	0	NaN	NaN	NaN	NaN	NaN	NaN
10 rows × 122 columns

 #Test data : target
app_test = pd.read_csv('./gdrive/MyDrive//home-credit-default-risk/application_test.csv')
print('Train data shape:', app_test.shape)
app_test.head(10)
	SK_ID_CURR	NAME_CONTRACT_TYPE	CODE_GENDER	FLAG_OWN_CAR	FLAG_OWN_REALTY	CNT_CHILDREN	AMT_INCOME_TOTAL	AMT_CREDIT	AMT_ANNUITY	AMT_GOODS_PRICE	...	FLAG_DOCUMENT_18	FLAG_DOCUMENT_19	FLAG_DOCUMENT_20	FLAG_DOCUMENT_21	AMT_REQ_CREDIT_BUREAU_HOUR	AMT_REQ_CREDIT_BUREAU_DAY	AMT_REQ_CREDIT_BUREAU_WEEK	AMT_REQ_CREDIT_BUREAU_MON	AMT_REQ_CREDIT_BUREAU_QRT	AMT_REQ_CREDIT_BUREAU_YEAR
0	100001	Cash loans	F	N	Y	0	135000.0	568800.0	20560.5	450000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	0.0
1	100005	Cash loans	M	N	Y	0	99000.0	222768.0	17370.0	180000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	3.0
2	100013	Cash loans	M	Y	Y	0	202500.0	663264.0	69777.0	630000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	1.0	4.0
3	100028	Cash loans	F	N	Y	2	315000.0	1575000.0	49018.5	1575000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	3.0
4	100038	Cash loans	M	Y	N	1	180000.0	625500.0	32067.0	625500.0	...	0	0	0	0	NaN	NaN	NaN	NaN	NaN	NaN
5	100042	Cash loans	F	Y	Y	0	270000.0	959688.0	34600.5	810000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	1.0	2.0
6	100057	Cash loans	M	Y	Y	2	180000.0	499221.0	22117.5	373500.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	1.0
7	100065	Cash loans	M	N	Y	0	166500.0	180000.0	14220.0	180000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	2.0
8	100066	Cash loans	F	N	Y	0	315000.0	364896.0	28957.5	315000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	5.0
9	100067	Cash loans	F	Y	Y	1	162000.0	45000.0	5337.0	45000.0	...	0	0	0	0	0.0	0.0	0.0	0.0	0.0	

좋은 웹페이지 즐겨찾기