Proportional Odds Model in R
1. Setting
1.1 데이터 불러오기
train = read.csv("newTrain.csv")
1.2 데이터 Structure 확인 후 변환
str(train)
str(train)
## 'data.frame': 22724 obs. of 22 variables:
## $ X : int 1 2 3 4 5 6 7 9 10 11 ...
## $ child_num : int 1 0 0 0 2 0 0 0 0 1 ...
## $ income_total : num 247500 450000 202500 157500 270000 ...
## $ income_type : chr "Commercial associate" "Working" "Commercial associate" "State servant" ...
## $ family_type : chr "Civil marriage" "Married" "Married" "Married" ...
## $ house_type : chr "House / apartment" "House / apartment" "House / apartment" "House / apartment" ...
## $ DAYS_BIRTH : int -11380 -19087 -15088 -15037 -13413 -17570 -14896 -15785 -19063 -11759 ...
## $ DAYS_EMPLOYED : int -1540 -4434 -2092 -2105 -4996 -1978 -5420 -1308 -2213 -91 ...
## $ occyp_type : chr "Laborers" "Managers" "Sales staff" "Managers" ...
## $ family_size : num 3 2 2 2 4 1 2 2 1 3 ...
## $ begin_month : num -5 -22 -37 -26 -18 -41 -53 -5 -40 -51 ...
## $ credit : num 1 2 0 2 1 2 0 2 2 2 ...
## $ income_quartile: num 4 4 3 3 4 4 4 1 3 2 ...
## $ income_quintile: num 5 5 4 3 5 5 5 1 4 2 ...
## $ income_decile : num 9 10 7 6 9 10 9 2 7 4 ...
## $ age : int 32 53 42 42 37 49 41 44 53 33 ...
## $ age_group : int 1 3 2 2 1 2 2 2 3 1 ...
## $ used_years : int 0 1 3 2 1 3 4 0 3 4 ...
## $ worked_year : int 4 12 5 5 13 5 14 3 6 0 ...
## $ women : num 1 0 1 1 1 1 0 1 1 0 ...
## $ yesCar : num 0 1 0 1 0 0 0 0 0 1 ...
## $ yesReality : num 1 1 1 1 1 0 1 1 1 1 ...
# training set
## X variables
train$income_type = as.factor(train$income_type)
train$family_type = as.factor(train$family_type)
train$house_type = as.factor(train$house_type)
train$occyp_type = as.factor(train$occyp_type)
## y variable
train$credit = as.factor(train$credit)
summary(train)
## X child_num income_total
## Min. : 1 Min. :0.0000 Min. : 27000
## 1st Qu.: 6656 1st Qu.:0.0000 1st Qu.: 121500
## Median :13268 Median :0.0000 Median : 157500
## Mean :13242 Mean :0.4174 Mean : 186461
## 3rd Qu.:19831 3rd Qu.:1.0000 3rd Qu.: 225000
## Max. :26456 Max. :4.0000 Max. :1575000
##
## income_type family_type
## Commercial associate: 5176 Civil marriage : 1824
## Pensioner : 4447 Married :15544
## State servant : 1762 Separated : 1333
## Student : 6 Single / not married: 3010
## Working :11333 Widow : 1013
##
##
## house_type DAYS_BIRTH DAYS_EMPLOYED
## Co-op apartment : 106 Min. :-25152 Min. :-15713
## House / apartment :20370 1st Qu.:-19777 1st Qu.: -3052
## Municipal apartment: 722 Median :-15798 Median : -1435
## Office apartment : 161 Mean :-16142 Mean : 69229
## Rented apartment : 347 3rd Qu.:-12557 3rd Qu.: -307
## With parents : 1018 Max. : -7705 Max. :365243
##
## occyp_type family_size begin_month credit income_quartile
## Laborers :4512 Min. : 1.000 Min. :-60 0: 2791 Min. :1.000
## Unempolyed :4438 1st Qu.: 2.000 1st Qu.:-39 1: 5413 1st Qu.:2.000
## Core staff :2646 Median : 2.000 Median :-24 2:14520 Median :3.000
## Sales staff:2539 Mean : 2.185 Mean :-26 Mean :2.629
## Managers :2167 3rd Qu.: 3.000 3rd Qu.:-12 3rd Qu.:4.000
## Drivers :1575 Max. :20.000 Max. : 0 Max. :4.000
## (Other) :4847
## income_quintile income_decile age age_group
## Min. :1.000 Min. : 1.000 Min. :22.00 Min. :0.000
## 1st Qu.:2.000 1st Qu.: 3.000 1st Qu.:35.00 1st Qu.:1.000
## Median :3.000 Median : 6.000 Median :44.00 Median :2.000
## Mean :3.122 Mean : 5.753 Mean :44.72 Mean :2.015
## 3rd Qu.:4.000 3rd Qu.: 8.000 3rd Qu.:55.00 3rd Qu.:3.000
## Max. :5.000 Max. :10.000 Max. :69.00 Max. :4.000
##
## used_years worked_year women yesCar
## Min. :0.000 Min. :-1.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.000 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :2.000 Median : 3.000 Median :1.0000 Median :0.0000
## Mean :1.719 Mean : 5.165 Mean :0.6658 Mean :0.3781
## 3rd Qu.:3.000 3rd Qu.: 8.000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :5.000 Max. :43.000 Max. :1.0000 Max. :1.0000
##
## yesReality
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.6805
## 3rd Qu.:1.0000
## Max. :1.0000
##
2. Proportional Odds Model
2.1 Library 불러오기
library(MASS)
# Credit order 지정
train$credit = factor(train$credit, ordered=TRUE,
levels=c("2", "1", "0"))
2.2 모델 생성 및 비교
mod1 = polr(credit~yesCar+yesReality+income_type+family_type
+occyp_type+income_quartile+age_group+worked_year
+begin_month,
data=train, Hess=TRUE)
summary(mod1)
## Call:
## polr(formula = credit ~ yesCar + yesReality + income_type + family_type +
## occyp_type + income_quartile + age_group + worked_year +
## begin_month, data = train, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## yesCar -3.728e-02 0.0303715 -1.22741
## yesReality 5.190e-02 0.0298230 1.74029
## income_typePensioner -1.183e+01 0.0438716 -269.63387
## income_typeState servant 5.072e-02 0.0603868 0.83999
## income_typeStudent 4.972e-01 0.7551641 0.65838
## income_typeWorking 8.469e-02 0.0355889 2.37967
## family_typeMarried -4.049e-02 0.0509414 -0.79479
## family_typeSeparated -2.546e-02 0.0747911 -0.34044
## family_typeSingle / not married -2.548e-02 0.0604889 -0.42123
## family_typeWidow 6.955e-02 0.0816933 0.85130
## occyp_typeCleaning staff -3.116e-01 0.1260286 -2.47225
## occyp_typeCooking staff -1.461e-01 0.1179518 -1.23861
## occyp_typeCore staff -1.006e-01 0.0790919 -1.27220
## occyp_typeDrivers -1.806e-01 0.0863227 -2.09223
## occyp_typeHigh skill tech staff -3.776e-02 0.0923882 -0.40876
## occyp_typeHR staff -1.055e+00 0.3442720 -3.06339
## occyp_typeIT staff 2.315e-01 0.3143037 0.73663
## occyp_typeLaborers -1.069e-01 0.0743313 -1.43861
## occyp_typeLow-skill Laborers -1.499e-01 0.2005121 -0.74764
## occyp_typeManagers -1.486e-01 0.0816779 -1.81883
## occyp_typeMedicine staff -2.739e-01 0.1006894 -2.72027
## occyp_typePrivate service staff -9.384e-02 0.1478357 -0.63479
## occyp_typeRealty agents -3.712e-01 0.2851487 -1.30190
## occyp_typeSales staff -1.623e-01 0.0790219 -2.05440
## occyp_typeSecretaries -2.901e-02 0.2110826 -0.13742
## occyp_typeSecurity staff -4.119e-01 0.1259874 -3.26941
## occyp_typeUnempolyed 1.179e+01 0.0438716 268.78506
## occyp_typeWaiters/barmen staff 6.335e-02 0.1817119 0.34861
## income_quartile -6.892e-04 0.0128734 -0.05354
## age_group -4.745e-02 0.0157109 -3.02049
## worked_year -4.582e-03 0.0026605 -1.72207
## begin_month 2.144e-02 0.0008544 25.09679
##
## Intercepts:
## Value Std. Error t value
## 2|1 -0.1572 0.0977 -1.6092
## 1|0 1.2729 0.0984 12.9401
##
## Residual Deviance: 39477.10
## AIC: 39545.10
- 시도해본 모델 중 위의 모델의 AIC가 가장 낮게 나왔다.
2.2.1 Model Variables' p-values
ctable = coef(summary(mod1))
p = pnorm(abs(ctable[, "t value"]), lower.tail = FALSE)*2
round(cbind(ctable, "p value"=p), 4)
## Value Std. Error t value p value
## yesCar -0.0373 0.0304 -1.2274 0.2197
## yesReality 0.0519 0.0298 1.7403 0.0818
## income_typePensioner -11.8293 0.0439 -269.6339 0.0000
## income_typeState servant 0.0507 0.0604 0.8400 0.4009
## income_typeStudent 0.4972 0.7552 0.6584 0.5103
## income_typeWorking 0.0847 0.0356 2.3797 0.0173
## family_typeMarried -0.0405 0.0509 -0.7948 0.4267
## family_typeSeparated -0.0255 0.0748 -0.3404 0.7335
## family_typeSingle / not married -0.0255 0.0605 -0.4212 0.6736
## family_typeWidow 0.0695 0.0817 0.8513 0.3946
## occyp_typeCleaning staff -0.3116 0.1260 -2.4723 0.0134
## occyp_typeCooking staff -0.1461 0.1180 -1.2386 0.2155
## occyp_typeCore staff -0.1006 0.0791 -1.2722 0.2033
## occyp_typeDrivers -0.1806 0.0863 -2.0922 0.0364
## occyp_typeHigh skill tech staff -0.0378 0.0924 -0.4088 0.6827
## occyp_typeHR staff -1.0546 0.3443 -3.0634 0.0022
## occyp_typeIT staff 0.2315 0.3143 0.7366 0.4613
## occyp_typeLaborers -0.1069 0.0743 -1.4386 0.1503
## occyp_typeLow-skill Laborers -0.1499 0.2005 -0.7476 0.4547
## occyp_typeManagers -0.1486 0.0817 -1.8188 0.0689
## occyp_typeMedicine staff -0.2739 0.1007 -2.7203 0.0065
## occyp_typePrivate service staff -0.0938 0.1478 -0.6348 0.5256
## occyp_typeRealty agents -0.3712 0.2851 -1.3019 0.1930
## occyp_typeSales staff -0.1623 0.0790 -2.0544 0.0399
## occyp_typeSecretaries -0.0290 0.2111 -0.1374 0.8907
## occyp_typeSecurity staff -0.4119 0.1260 -3.2694 0.0011
## occyp_typeUnempolyed 11.7920 0.0439 268.7851 0.0000
## occyp_typeWaiters/barmen staff 0.0633 0.1817 0.3486 0.7274
## income_quartile -0.0007 0.0129 -0.0535 0.9573
## age_group -0.0475 0.0157 -3.0205 0.0025
## worked_year -0.0046 0.0027 -1.7221 0.0851
## begin_month 0.0214 0.0009 25.0968 0.0000
## 2|1 -0.1572 0.0977 -1.6092 0.1076
## 1|0 1.2729 0.0984 12.9401 0.0000
- 현재 p-values를 고려 했을 때 'yesCar', 'yesReality', 'family_type', 'income_quartile', 'worked_year'의 경우 drop하는 것이 좋아 보인다.
2.3 Best Model's p-values
mod2 = polr(credit~income_type+occyp_type+age_group+begin_month,
data=train, Hess=TRUE)
summary(mod2)
## Call:
## polr(formula = credit ~ income_type + occyp_type + age_group +
## begin_month, data = train, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## income_typePensioner -11.832886 0.0410348 -288.36236
## income_typeState servant 0.038892 0.0599544 0.64870
## income_typeStudent 0.530268 0.7521999 0.70496
## income_typeWorking 0.079266 0.0352396 2.24933
## occyp_typeCleaning staff -0.281468 0.1254583 -2.24352
## occyp_typeCooking staff -0.131518 0.1174376 -1.11990
## occyp_typeCore staff -0.102439 0.0790278 -1.29624
## occyp_typeDrivers -0.190993 0.0855666 -2.23210
## occyp_typeHigh skill tech staff -0.044318 0.0922492 -0.48041
## occyp_typeHR staff -1.039204 0.3437040 -3.02354
## occyp_typeIT staff 0.228233 0.3140885 0.72665
## occyp_typeLaborers -0.109843 0.0742864 -1.47864
## occyp_typeLow-skill Laborers -0.133683 0.1997971 -0.66909
## occyp_typeManagers -0.157925 0.0811011 -1.94726
## occyp_typeMedicine staff -0.285005 0.1002296 -2.84352
## occyp_typePrivate service staff -0.086903 0.1477218 -0.58829
## occyp_typeRealty agents -0.367264 0.2846345 -1.29030
## occyp_typeSales staff -0.150683 0.0788143 -1.91187
## occyp_typeSecretaries -0.005761 0.2107002 -0.02734
## occyp_typeSecurity staff -0.396938 0.1256130 -3.16001
## occyp_typeUnempolyed 11.857727 0.0410347 288.96814
## occyp_typeWaiters/barmen staff 0.094600 0.1811099 0.52234
## age_group -0.051154 0.0145921 -3.50561
## begin_month 0.021607 0.0008522 25.35481
##
## Intercepts:
## Value Std. Error t value
## 2|1 -0.1275 0.0771 -1.6534
## 1|0 1.3021 0.0780 16.6932
##
## Residual Deviance: 39488.04
## AIC: 39540.04
2.3.1 mod2's p-vlaues
ctable = coef(summary(mod2))
p = pnorm(abs(ctable[, "t value"]), lower.tail = FALSE)*2
round(cbind(ctable, "p value"=p), 4)
## Value Std. Error t value p value
## income_typePensioner -11.8329 0.0410 -288.3624 0.0000
## income_typeState servant 0.0389 0.0600 0.6487 0.5165
## income_typeStudent 0.5303 0.7522 0.7050 0.4808
## income_typeWorking 0.0793 0.0352 2.2493 0.0245
## occyp_typeCleaning staff -0.2815 0.1255 -2.2435 0.0249
## occyp_typeCooking staff -0.1315 0.1174 -1.1199 0.2628
## occyp_typeCore staff -0.1024 0.0790 -1.2962 0.1949
## occyp_typeDrivers -0.1910 0.0856 -2.2321 0.0256
## occyp_typeHigh skill tech staff -0.0443 0.0922 -0.4804 0.6309
## occyp_typeHR staff -1.0392 0.3437 -3.0235 0.0025
## occyp_typeIT staff 0.2282 0.3141 0.7267 0.4674
## occyp_typeLaborers -0.1098 0.0743 -1.4786 0.1392
## occyp_typeLow-skill Laborers -0.1337 0.1998 -0.6691 0.5034
## occyp_typeManagers -0.1579 0.0811 -1.9473 0.0515
## occyp_typeMedicine staff -0.2850 0.1002 -2.8435 0.0045
## occyp_typePrivate service staff -0.0869 0.1477 -0.5883 0.5563
## occyp_typeRealty agents -0.3673 0.2846 -1.2903 0.1969
## occyp_typeSales staff -0.1507 0.0788 -1.9119 0.0559
## occyp_typeSecretaries -0.0058 0.2107 -0.0273 0.9782
## occyp_typeSecurity staff -0.3969 0.1256 -3.1600 0.0016
## occyp_typeUnempolyed 11.8577 0.0410 288.9681 0.0000
## occyp_typeWaiters/barmen staff 0.0946 0.1811 0.5223 0.6014
## age_group -0.0512 0.0146 -3.5056 0.0005
## begin_month 0.0216 0.0009 25.3548 0.0000
## 2|1 -0.1275 0.0771 -1.6534 0.0982
## 1|0 1.3021 0.0780 16.6932 0.0000
- income_type에서 'Pensioner', 'Working' 만이 의미있는 정보로 보인다.
- occyp_type 에서 'Cleaning staff', 'Drivers', 'HR staff', 'Medicine staff', 'Security staff', 'Unempolyed'가 의미있는 변수들로 고려된다.
2.4 Model ANOVA Comparison
anova(mod1, mod2, test = "Chisq")
## Likelihood ratio tests of ordinal regression models
##
## Response: credit
## Model
## 1 income_type + occyp_type + age_group + begin_month
## 2 yesCar + yesReality + income_type + family_type + occyp_type + income_quartile + age_group + worked_year + begin_month
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 22698 39488.04
## 2 22690 39477.10 1 vs 2 8 10.94169 0.2050253
- ANOVA Comparison 또한 mod2 가 mod1보다 성능이 좋다고 보여준다.
Author And Source
이 문제에 관하여(Proportional Odds Model in R), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다
https://velog.io/@dohy426/Proportional-Odds-Model-in-R
저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념
(Collection and Share based on the CC Protocol.)
train = read.csv("newTrain.csv")
str(train)
str(train)
## 'data.frame': 22724 obs. of 22 variables:
## $ X : int 1 2 3 4 5 6 7 9 10 11 ...
## $ child_num : int 1 0 0 0 2 0 0 0 0 1 ...
## $ income_total : num 247500 450000 202500 157500 270000 ...
## $ income_type : chr "Commercial associate" "Working" "Commercial associate" "State servant" ...
## $ family_type : chr "Civil marriage" "Married" "Married" "Married" ...
## $ house_type : chr "House / apartment" "House / apartment" "House / apartment" "House / apartment" ...
## $ DAYS_BIRTH : int -11380 -19087 -15088 -15037 -13413 -17570 -14896 -15785 -19063 -11759 ...
## $ DAYS_EMPLOYED : int -1540 -4434 -2092 -2105 -4996 -1978 -5420 -1308 -2213 -91 ...
## $ occyp_type : chr "Laborers" "Managers" "Sales staff" "Managers" ...
## $ family_size : num 3 2 2 2 4 1 2 2 1 3 ...
## $ begin_month : num -5 -22 -37 -26 -18 -41 -53 -5 -40 -51 ...
## $ credit : num 1 2 0 2 1 2 0 2 2 2 ...
## $ income_quartile: num 4 4 3 3 4 4 4 1 3 2 ...
## $ income_quintile: num 5 5 4 3 5 5 5 1 4 2 ...
## $ income_decile : num 9 10 7 6 9 10 9 2 7 4 ...
## $ age : int 32 53 42 42 37 49 41 44 53 33 ...
## $ age_group : int 1 3 2 2 1 2 2 2 3 1 ...
## $ used_years : int 0 1 3 2 1 3 4 0 3 4 ...
## $ worked_year : int 4 12 5 5 13 5 14 3 6 0 ...
## $ women : num 1 0 1 1 1 1 0 1 1 0 ...
## $ yesCar : num 0 1 0 1 0 0 0 0 0 1 ...
## $ yesReality : num 1 1 1 1 1 0 1 1 1 1 ...
# training set
## X variables
train$income_type = as.factor(train$income_type)
train$family_type = as.factor(train$family_type)
train$house_type = as.factor(train$house_type)
train$occyp_type = as.factor(train$occyp_type)
## y variable
train$credit = as.factor(train$credit)
summary(train)
## X child_num income_total
## Min. : 1 Min. :0.0000 Min. : 27000
## 1st Qu.: 6656 1st Qu.:0.0000 1st Qu.: 121500
## Median :13268 Median :0.0000 Median : 157500
## Mean :13242 Mean :0.4174 Mean : 186461
## 3rd Qu.:19831 3rd Qu.:1.0000 3rd Qu.: 225000
## Max. :26456 Max. :4.0000 Max. :1575000
##
## income_type family_type
## Commercial associate: 5176 Civil marriage : 1824
## Pensioner : 4447 Married :15544
## State servant : 1762 Separated : 1333
## Student : 6 Single / not married: 3010
## Working :11333 Widow : 1013
##
##
## house_type DAYS_BIRTH DAYS_EMPLOYED
## Co-op apartment : 106 Min. :-25152 Min. :-15713
## House / apartment :20370 1st Qu.:-19777 1st Qu.: -3052
## Municipal apartment: 722 Median :-15798 Median : -1435
## Office apartment : 161 Mean :-16142 Mean : 69229
## Rented apartment : 347 3rd Qu.:-12557 3rd Qu.: -307
## With parents : 1018 Max. : -7705 Max. :365243
##
## occyp_type family_size begin_month credit income_quartile
## Laborers :4512 Min. : 1.000 Min. :-60 0: 2791 Min. :1.000
## Unempolyed :4438 1st Qu.: 2.000 1st Qu.:-39 1: 5413 1st Qu.:2.000
## Core staff :2646 Median : 2.000 Median :-24 2:14520 Median :3.000
## Sales staff:2539 Mean : 2.185 Mean :-26 Mean :2.629
## Managers :2167 3rd Qu.: 3.000 3rd Qu.:-12 3rd Qu.:4.000
## Drivers :1575 Max. :20.000 Max. : 0 Max. :4.000
## (Other) :4847
## income_quintile income_decile age age_group
## Min. :1.000 Min. : 1.000 Min. :22.00 Min. :0.000
## 1st Qu.:2.000 1st Qu.: 3.000 1st Qu.:35.00 1st Qu.:1.000
## Median :3.000 Median : 6.000 Median :44.00 Median :2.000
## Mean :3.122 Mean : 5.753 Mean :44.72 Mean :2.015
## 3rd Qu.:4.000 3rd Qu.: 8.000 3rd Qu.:55.00 3rd Qu.:3.000
## Max. :5.000 Max. :10.000 Max. :69.00 Max. :4.000
##
## used_years worked_year women yesCar
## Min. :0.000 Min. :-1.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.000 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :2.000 Median : 3.000 Median :1.0000 Median :0.0000
## Mean :1.719 Mean : 5.165 Mean :0.6658 Mean :0.3781
## 3rd Qu.:3.000 3rd Qu.: 8.000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :5.000 Max. :43.000 Max. :1.0000 Max. :1.0000
##
## yesReality
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.6805
## 3rd Qu.:1.0000
## Max. :1.0000
##
2.1 Library 불러오기
library(MASS)
# Credit order 지정
train$credit = factor(train$credit, ordered=TRUE,
levels=c("2", "1", "0"))
2.2 모델 생성 및 비교
mod1 = polr(credit~yesCar+yesReality+income_type+family_type
+occyp_type+income_quartile+age_group+worked_year
+begin_month,
data=train, Hess=TRUE)
summary(mod1)
## Call:
## polr(formula = credit ~ yesCar + yesReality + income_type + family_type +
## occyp_type + income_quartile + age_group + worked_year +
## begin_month, data = train, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## yesCar -3.728e-02 0.0303715 -1.22741
## yesReality 5.190e-02 0.0298230 1.74029
## income_typePensioner -1.183e+01 0.0438716 -269.63387
## income_typeState servant 5.072e-02 0.0603868 0.83999
## income_typeStudent 4.972e-01 0.7551641 0.65838
## income_typeWorking 8.469e-02 0.0355889 2.37967
## family_typeMarried -4.049e-02 0.0509414 -0.79479
## family_typeSeparated -2.546e-02 0.0747911 -0.34044
## family_typeSingle / not married -2.548e-02 0.0604889 -0.42123
## family_typeWidow 6.955e-02 0.0816933 0.85130
## occyp_typeCleaning staff -3.116e-01 0.1260286 -2.47225
## occyp_typeCooking staff -1.461e-01 0.1179518 -1.23861
## occyp_typeCore staff -1.006e-01 0.0790919 -1.27220
## occyp_typeDrivers -1.806e-01 0.0863227 -2.09223
## occyp_typeHigh skill tech staff -3.776e-02 0.0923882 -0.40876
## occyp_typeHR staff -1.055e+00 0.3442720 -3.06339
## occyp_typeIT staff 2.315e-01 0.3143037 0.73663
## occyp_typeLaborers -1.069e-01 0.0743313 -1.43861
## occyp_typeLow-skill Laborers -1.499e-01 0.2005121 -0.74764
## occyp_typeManagers -1.486e-01 0.0816779 -1.81883
## occyp_typeMedicine staff -2.739e-01 0.1006894 -2.72027
## occyp_typePrivate service staff -9.384e-02 0.1478357 -0.63479
## occyp_typeRealty agents -3.712e-01 0.2851487 -1.30190
## occyp_typeSales staff -1.623e-01 0.0790219 -2.05440
## occyp_typeSecretaries -2.901e-02 0.2110826 -0.13742
## occyp_typeSecurity staff -4.119e-01 0.1259874 -3.26941
## occyp_typeUnempolyed 1.179e+01 0.0438716 268.78506
## occyp_typeWaiters/barmen staff 6.335e-02 0.1817119 0.34861
## income_quartile -6.892e-04 0.0128734 -0.05354
## age_group -4.745e-02 0.0157109 -3.02049
## worked_year -4.582e-03 0.0026605 -1.72207
## begin_month 2.144e-02 0.0008544 25.09679
##
## Intercepts:
## Value Std. Error t value
## 2|1 -0.1572 0.0977 -1.6092
## 1|0 1.2729 0.0984 12.9401
##
## Residual Deviance: 39477.10
## AIC: 39545.10
- 시도해본 모델 중 위의 모델의 AIC가 가장 낮게 나왔다.
2.2.1 Model Variables' p-values
ctable = coef(summary(mod1))
p = pnorm(abs(ctable[, "t value"]), lower.tail = FALSE)*2
round(cbind(ctable, "p value"=p), 4)
## Value Std. Error t value p value
## yesCar -0.0373 0.0304 -1.2274 0.2197
## yesReality 0.0519 0.0298 1.7403 0.0818
## income_typePensioner -11.8293 0.0439 -269.6339 0.0000
## income_typeState servant 0.0507 0.0604 0.8400 0.4009
## income_typeStudent 0.4972 0.7552 0.6584 0.5103
## income_typeWorking 0.0847 0.0356 2.3797 0.0173
## family_typeMarried -0.0405 0.0509 -0.7948 0.4267
## family_typeSeparated -0.0255 0.0748 -0.3404 0.7335
## family_typeSingle / not married -0.0255 0.0605 -0.4212 0.6736
## family_typeWidow 0.0695 0.0817 0.8513 0.3946
## occyp_typeCleaning staff -0.3116 0.1260 -2.4723 0.0134
## occyp_typeCooking staff -0.1461 0.1180 -1.2386 0.2155
## occyp_typeCore staff -0.1006 0.0791 -1.2722 0.2033
## occyp_typeDrivers -0.1806 0.0863 -2.0922 0.0364
## occyp_typeHigh skill tech staff -0.0378 0.0924 -0.4088 0.6827
## occyp_typeHR staff -1.0546 0.3443 -3.0634 0.0022
## occyp_typeIT staff 0.2315 0.3143 0.7366 0.4613
## occyp_typeLaborers -0.1069 0.0743 -1.4386 0.1503
## occyp_typeLow-skill Laborers -0.1499 0.2005 -0.7476 0.4547
## occyp_typeManagers -0.1486 0.0817 -1.8188 0.0689
## occyp_typeMedicine staff -0.2739 0.1007 -2.7203 0.0065
## occyp_typePrivate service staff -0.0938 0.1478 -0.6348 0.5256
## occyp_typeRealty agents -0.3712 0.2851 -1.3019 0.1930
## occyp_typeSales staff -0.1623 0.0790 -2.0544 0.0399
## occyp_typeSecretaries -0.0290 0.2111 -0.1374 0.8907
## occyp_typeSecurity staff -0.4119 0.1260 -3.2694 0.0011
## occyp_typeUnempolyed 11.7920 0.0439 268.7851 0.0000
## occyp_typeWaiters/barmen staff 0.0633 0.1817 0.3486 0.7274
## income_quartile -0.0007 0.0129 -0.0535 0.9573
## age_group -0.0475 0.0157 -3.0205 0.0025
## worked_year -0.0046 0.0027 -1.7221 0.0851
## begin_month 0.0214 0.0009 25.0968 0.0000
## 2|1 -0.1572 0.0977 -1.6092 0.1076
## 1|0 1.2729 0.0984 12.9401 0.0000
- 현재 p-values를 고려 했을 때 'yesCar', 'yesReality', 'family_type', 'income_quartile', 'worked_year'의 경우 drop하는 것이 좋아 보인다.
2.3 Best Model's p-values
mod2 = polr(credit~income_type+occyp_type+age_group+begin_month,
data=train, Hess=TRUE)
summary(mod2)
## Call:
## polr(formula = credit ~ income_type + occyp_type + age_group +
## begin_month, data = train, Hess = TRUE)
##
## Coefficients:
## Value Std. Error t value
## income_typePensioner -11.832886 0.0410348 -288.36236
## income_typeState servant 0.038892 0.0599544 0.64870
## income_typeStudent 0.530268 0.7521999 0.70496
## income_typeWorking 0.079266 0.0352396 2.24933
## occyp_typeCleaning staff -0.281468 0.1254583 -2.24352
## occyp_typeCooking staff -0.131518 0.1174376 -1.11990
## occyp_typeCore staff -0.102439 0.0790278 -1.29624
## occyp_typeDrivers -0.190993 0.0855666 -2.23210
## occyp_typeHigh skill tech staff -0.044318 0.0922492 -0.48041
## occyp_typeHR staff -1.039204 0.3437040 -3.02354
## occyp_typeIT staff 0.228233 0.3140885 0.72665
## occyp_typeLaborers -0.109843 0.0742864 -1.47864
## occyp_typeLow-skill Laborers -0.133683 0.1997971 -0.66909
## occyp_typeManagers -0.157925 0.0811011 -1.94726
## occyp_typeMedicine staff -0.285005 0.1002296 -2.84352
## occyp_typePrivate service staff -0.086903 0.1477218 -0.58829
## occyp_typeRealty agents -0.367264 0.2846345 -1.29030
## occyp_typeSales staff -0.150683 0.0788143 -1.91187
## occyp_typeSecretaries -0.005761 0.2107002 -0.02734
## occyp_typeSecurity staff -0.396938 0.1256130 -3.16001
## occyp_typeUnempolyed 11.857727 0.0410347 288.96814
## occyp_typeWaiters/barmen staff 0.094600 0.1811099 0.52234
## age_group -0.051154 0.0145921 -3.50561
## begin_month 0.021607 0.0008522 25.35481
##
## Intercepts:
## Value Std. Error t value
## 2|1 -0.1275 0.0771 -1.6534
## 1|0 1.3021 0.0780 16.6932
##
## Residual Deviance: 39488.04
## AIC: 39540.04
2.3.1 mod2's p-vlaues
ctable = coef(summary(mod2))
p = pnorm(abs(ctable[, "t value"]), lower.tail = FALSE)*2
round(cbind(ctable, "p value"=p), 4)
## Value Std. Error t value p value
## income_typePensioner -11.8329 0.0410 -288.3624 0.0000
## income_typeState servant 0.0389 0.0600 0.6487 0.5165
## income_typeStudent 0.5303 0.7522 0.7050 0.4808
## income_typeWorking 0.0793 0.0352 2.2493 0.0245
## occyp_typeCleaning staff -0.2815 0.1255 -2.2435 0.0249
## occyp_typeCooking staff -0.1315 0.1174 -1.1199 0.2628
## occyp_typeCore staff -0.1024 0.0790 -1.2962 0.1949
## occyp_typeDrivers -0.1910 0.0856 -2.2321 0.0256
## occyp_typeHigh skill tech staff -0.0443 0.0922 -0.4804 0.6309
## occyp_typeHR staff -1.0392 0.3437 -3.0235 0.0025
## occyp_typeIT staff 0.2282 0.3141 0.7267 0.4674
## occyp_typeLaborers -0.1098 0.0743 -1.4786 0.1392
## occyp_typeLow-skill Laborers -0.1337 0.1998 -0.6691 0.5034
## occyp_typeManagers -0.1579 0.0811 -1.9473 0.0515
## occyp_typeMedicine staff -0.2850 0.1002 -2.8435 0.0045
## occyp_typePrivate service staff -0.0869 0.1477 -0.5883 0.5563
## occyp_typeRealty agents -0.3673 0.2846 -1.2903 0.1969
## occyp_typeSales staff -0.1507 0.0788 -1.9119 0.0559
## occyp_typeSecretaries -0.0058 0.2107 -0.0273 0.9782
## occyp_typeSecurity staff -0.3969 0.1256 -3.1600 0.0016
## occyp_typeUnempolyed 11.8577 0.0410 288.9681 0.0000
## occyp_typeWaiters/barmen staff 0.0946 0.1811 0.5223 0.6014
## age_group -0.0512 0.0146 -3.5056 0.0005
## begin_month 0.0216 0.0009 25.3548 0.0000
## 2|1 -0.1275 0.0771 -1.6534 0.0982
## 1|0 1.3021 0.0780 16.6932 0.0000
- income_type에서 'Pensioner', 'Working' 만이 의미있는 정보로 보인다.
- occyp_type 에서 'Cleaning staff', 'Drivers', 'HR staff', 'Medicine staff', 'Security staff', 'Unempolyed'가 의미있는 변수들로 고려된다.
2.4 Model ANOVA Comparison
anova(mod1, mod2, test = "Chisq")
## Likelihood ratio tests of ordinal regression models
##
## Response: credit
## Model
## 1 income_type + occyp_type + age_group + begin_month
## 2 yesCar + yesReality + income_type + family_type + occyp_type + income_quartile + age_group + worked_year + begin_month
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 22698 39488.04
## 2 22690 39477.10 1 vs 2 8 10.94169 0.2050253
- ANOVA Comparison 또한 mod2 가 mod1보다 성능이 좋다고 보여준다.
Author And Source
이 문제에 관하여(Proportional Odds Model in R), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@dohy426/Proportional-Odds-Model-in-R저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)