R 코드 정리

28679 단어 R 의학 통계 R

1. CSV 파일 불러오기

df = read.csv(file="/Users/shlee/Dropbox/R/acs.csv", header= T)

2. 데이터 확인

head(): 데이터 확인
str(): 데이터 구조 확인
summary(): 데이터 요약

head(df)

age	sex	cardiogenicShock	entry	Dx	EF	height	weight	BMI	obesity	TC	LDLC	HDLC	TG	DM	HBP	smoking
62	Male	No	Femoral	STEMI	18.0	168	72	25.51020	Yes	215	154	35	155	Yes	No	Smoker
78	Female	No	Femoral	STEMI	18.4	148	48	21.91381	No	NA	NA	NA	166	No	Yes	Never
76	Female	Yes	Femoral	STEMI	20.0	NA	NA	NA	No	NA	NA	NA	NA	No	Yes	Never
89	Female	No	Femoral	STEMI	21.8	165	50	18.36547	No	121	73	20	89	No	No	Never
56	Male	No	Radial	NSTEMI	21.8	162	64	24.38653	No	195	151	36	63	Yes	Yes	Smoker
73	Female	No	Radial	Unstable Angina	22.0	153	59	25.20398	Yes	184	112	38	137	Yes	Yes	Never

str(df)

'data.frame':	857 obs. of  17 variables:
 $ age             : int  62 78 76 89 56 73 58 62 59 71 ...
 $ sex             : Factor w/ 2 levels "Female","Male": 2 1 1 1 2 1 2 2 1 2 ...
 $ cardiogenicShock: Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 1 1 1 1 ...
 $ entry           : Factor w/ 2 levels "Femoral","Radial": 1 1 1 1 2 2 2 1 2 1 ...
 $ Dx              : Factor w/ 3 levels "NSTEMI","STEMI",..: 2 2 2 2 1 3 3 2 3 2 ...
 $ EF              : num  18 18.4 20 21.8 21.8 22 24.7 26.6 28.5 31.1 ...
 $ height          : num  168 148 NA 165 162 153 167 160 152 168 ...
 $ weight          : num  72 48 NA 50 64 59 78 50 67 60 ...
 $ BMI             : num  25.5 21.9 NA 18.4 24.4 ...
 $ obesity         : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 2 2 1 2 1 ...
 $ TC              : num  215 NA NA 121 195 184 161 136 239 169 ...
 $ LDLC            : int  154 NA NA 73 151 112 91 88 161 88 ...
 $ HDLC            : int  35 NA NA 20 36 38 34 33 34 54 ...
 $ TG              : int  155 166 NA 89 63 137 196 30 118 141 ...
 $ DM              : Factor w/ 2 levels "No","Yes": 2 1 1 1 2 2 2 2 2 2 ...
 $ HBP             : Factor w/ 2 levels "No","Yes": 1 2 2 1 2 2 2 2 2 1 ...
 $ smoking         : Factor w/ 3 levels "Ex-smoker","Never",..: 3 2 2 2 3 2 1 1 2 3 ...

summary(df)

      age            sex      cardiogenicShock     entry    
 Min.   :28.00   Female:287   No :805          Femoral:312  
 1st Qu.:55.00   Male  :570   Yes: 52          Radial :545  
 Median :64.00                                              
 Mean   :63.31                                              
 3rd Qu.:72.00                                              
 Max.   :91.00                                              
                                                            
               Dx            EF            height          weight      
 NSTEMI         :153   Min.   :18.00   Min.   :130.0   Min.   : 30.00  
 STEMI          :304   1st Qu.:50.45   1st Qu.:158.0   1st Qu.: 58.00  
 Unstable Angina:400   Median :58.10   Median :165.0   Median : 65.00  
                       Mean   :55.83   Mean   :163.2   Mean   : 64.84  
                       3rd Qu.:62.35   3rd Qu.:170.0   3rd Qu.: 72.00  
                       Max.   :79.00   Max.   :185.0   Max.   :112.00  
                       NA's   :134     NA's   :93      NA's   :91      
      BMI        obesity         TC             LDLC            HDLC      
 Min.   :15.62   No :567   Min.   : 25.0   Min.   : 15.0   Min.   : 4.00  
 1st Qu.:22.13   Yes:290   1st Qu.:154.0   1st Qu.: 88.0   1st Qu.:32.00  
 Median :24.16             Median :183.0   Median :114.0   Median :38.00  
 Mean   :24.28             Mean   :185.2   Mean   :116.6   Mean   :38.24  
 3rd Qu.:26.17             3rd Qu.:213.0   3rd Qu.:141.0   3rd Qu.:45.00  
 Max.   :41.42             Max.   :493.0   Max.   :366.0   Max.   :89.00  
 NA's   :93                NA's   :23      NA's   :24      NA's   :23     
       TG          DM       HBP           smoking   
 Min.   : 11.0   No :553   No :356   Ex-smoker:204  
 1st Qu.: 68.0   Yes:304   Yes:501   Never    :332  
 Median :105.5                       Smoker   :321  
 Mean   :125.2                                      
 3rd Qu.:154.0                                      
 Max.   :877.0                                      
 NA's   :15

3. 데이터 정리

1. 결측치 제거

na.omit(): 결측치 제거

str(df)를 보면 857명의 데이터가 있다.
na.omit(df) 실행 후 다시 str(df)를 실행하면 결측치가 있는 환자들은 제거되고 677명의 환자만 남았다.

df = na.omit(df)
str(df)

'data.frame':	677 obs. of  17 variables:
 $ age             : int  62 89 56 73 58 62 59 71 52 52 ...
 $ sex             : Factor w/ 2 levels "Female","Male": 2 1 2 1 2 2 1 2 2 1 ...
 $ cardiogenicShock: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ entry           : Factor w/ 2 levels "Femoral","Radial": 1 1 2 2 2 1 2 1 2 2 ...
 $ Dx              : Factor w/ 3 levels "NSTEMI","STEMI",..: 2 2 1 3 3 2 3 2 3 3 ...
 $ EF              : num  18 21.8 21.8 22 24.7 26.6 28.5 31.1 31.1 31.1 ...
 $ height          : num  168 165 162 153 167 160 152 168 175 156 ...
 $ weight          : num  72 50 64 59 78 50 67 60 60 63 ...
 $ BMI             : num  25.5 18.4 24.4 25.2 28 ...
 $ obesity         : Factor w/ 2 levels "No","Yes": 2 1 1 2 2 1 2 1 1 2 ...
 $ TC              : num  215 121 195 184 161 136 239 169 272 184 ...
 $ LDLC            : int  154 73 151 112 91 88 161 88 212 123 ...
 $ HDLC            : int  35 20 36 38 34 33 34 54 32 43 ...
 $ TG              : int  155 89 63 137 196 30 118 141 52 72 ...
 $ DM              : Factor w/ 2 levels "No","Yes": 2 1 2 2 2 2 2 2 2 2 ...
 $ HBP             : Factor w/ 2 levels "No","Yes": 1 1 2 2 2 2 2 1 1 2 ...
 $ smoking         : Factor w/ 3 levels "Ex-smoker","Never",..: 3 2 3 2 1 1 2 3 1 2 ...
 - attr(*, "na.action")= 'omit' Named int  2 3 16 18 29 72 87 89 102 108 ...
  ..- attr(*, "names")= chr  "2" "3" "16" "18" ...

2. 범주형 변수로 변환

데이터를 엑셀에 정리 할 때 흔히 생존은 0, 사망은 1
또는 성별을 남자는 0, 여자는 1 이런식으로 숫자를 셀에 입력한다.

이렇게 작성한 csv 파일은 데이터가 정수형(int)로 담겨져 있기 때문에 범주형 자료로 변환을 해야한다.
그렇지 않으면 성별이 0.5 처럼 이상한 수치가 나오게 된다.

library(pROC)
data(aSAH)
aSAH$gos6 <- as.integer(aSAH$gos6)

Type 'citation("pROC")' for a citation.

Attaching package: ‘pROC’

The following objects are masked from ‘package:stats’:

    cov, smooth, var

head(aSAH)
str(aSAH)
summary(aSAH)

	gos6	outcome	gender	age	wfns	s100b	ndka
29	5	Good	Female	42	1	0.13	3.01
30	5	Good	Female	37	1	0.14	8.54
31	5	Good	Female	42	1	0.10	8.09
32	5	Good	Female	27	1	0.04	10.42
33	1	Poor	Female	42	3	0.13	17.40
34	1	Poor	Male	48	2	0.10	12.75

'data.frame':	113 obs. of  7 variables:
 $ gos6   : int  5 5 5 5 1 1 4 1 5 4 ...
 $ outcome: Factor w/ 2 levels "Good","Poor": 1 1 1 1 2 2 1 2 1 1 ...
 $ gender : Factor w/ 2 levels "Male","Female": 2 2 2 2 2 1 1 1 2 2 ...
 $ age    : int  42 37 42 27 42 48 57 41 49 75 ...
 $ wfns   : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 1 1 3 2 5 4 1 2 ...
 $ s100b  : num  0.13 0.14 0.1 0.04 0.13 0.1 0.47 0.16 0.18 0.1 ...
 $ ndka   : num  3.01 8.54 8.09 10.42 17.4 ...



      gos6       outcome      gender        age       wfns       s100b      
 Min.   :1.000   Good:72   Male  :42   Min.   :18.0   1:39   Min.   :0.030  
 1st Qu.:3.000   Poor:41   Female:71   1st Qu.:42.0   2:32   1st Qu.:0.090  
 Median :5.000                         Median :51.0   3: 4   Median :0.140  
 Mean   :3.726                         Mean   :51.1   4:16   Mean   :0.247  
 3rd Qu.:5.000                         3rd Qu.:61.0   5:22   3rd Qu.:0.330  
 Max.   :5.000                         Max.   :81.0          Max.   :2.070  
      ndka       
 Min.   :  3.01  
 1st Qu.:  9.01  
 Median : 12.22  
 Mean   : 19.66  
 3rd Qu.: 17.30  
 Max.   :419.19

aSAH 데이터는 pROC package에서 불러온 데이터다.
gos6가 애초에 범주형 변수로 저장되어 있었으나 임의로 정수형 변수로 변환 시켰다.

str(aSAH)를 보면 gos6가 int(정수형 자료)로 되어있다.
summary(aSAH)를 보면 gos6의 평균, 최소, 최대값 등이 나와있다.

이제 gos6를 다시 범주형 변수로 변환하겠다.

aSAH$gos6 <- factor(aSAH$gos6, levels=c(1:5), labels=c("Good", "Moderate", "Severe", "Vegetative", "Death"))

gos6를 levels=c(1:5)로 1, 2, 3, 4, 5로 순서를 주었다.
그리고 labels=c("Good", "Moderate", "Severe", "Vegetative", "Death")로 순서에 해당하는 labels을 부여했다.

반대로 leels=c(5:1)로 순서를 반대로 해도 된다.
그러면 labels=c("Death", "Vegetative", "Severe", "Moderate", "Good")으로
label도 순서를 반대로 해야 이름이 제대로 부여 된다.

head(aSAH)
str(aSAH)
summary(aSAH)

	gos6	outcome	gender	age	wfns	s100b	ndka
29	Death	Good	Female	42	1	0.13	3.01
30	Death	Good	Female	37	1	0.14	8.54
31	Death	Good	Female	42	1	0.10	8.09
32	Death	Good	Female	27	1	0.04	10.42
33	Good	Poor	Female	42	3	0.13	17.40
34	Good	Poor	Male	48	2	0.10	12.75

'data.frame':	113 obs. of  7 variables:
 $ gos6   : Factor w/ 5 levels "Good","Moderate",..: 5 5 5 5 1 1 4 1 5 4 ...
 $ outcome: Factor w/ 2 levels "Good","Poor": 1 1 1 1 2 2 1 2 1 1 ...
 $ gender : Factor w/ 2 levels "Male","Female": 2 2 2 2 2 1 1 1 2 2 ...
 $ age    : int  42 37 42 27 42 48 57 41 49 75 ...
 $ wfns   : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 1 1 3 2 5 4 1 2 ...
 $ s100b  : num  0.13 0.14 0.1 0.04 0.13 0.1 0.47 0.16 0.18 0.1 ...
 $ ndka   : num  3.01 8.54 8.09 10.42 17.4 ...



         gos6    outcome      gender        age       wfns       s100b      
 Good      :28   Good:72   Male  :42   Min.   :18.0   1:39   Min.   :0.030  
 Moderate  : 0   Poor:41   Female:71   1st Qu.:42.0   2:32   1st Qu.:0.090  
 Severe    :13                         Median :51.0   3: 4   Median :0.140  
 Vegetative: 6                         Mean   :51.1   4:16   Mean   :0.247  
 Death     :66                         3rd Qu.:61.0   5:22   3rd Qu.:0.330  
                                       Max.   :81.0          Max.   :2.070  
      ndka       
 Min.   :  3.01  
 1st Qu.:  9.01  
 Median : 12.22  
 Mean   : 19.66  
 3rd Qu.: 17.30  
 Max.   :419.19

head(aSAH): 5가 Death로, 1이 Good으로 변환되어 있다.
str(aSAH): gos6의 자료형이 Factor로 변환되었으며 Good, Moderate... 의 순서를 가진다.
summary(aSAH): 각 범주에 해당하는 환자의 수가 나온다.

4. 두 그룹의 평균 비교

0. 검정

1. 정규성 검정

output = lm(age ~ cardiogenicShock, data=df)
shapiro.test(resid(output))

	Shapiro-Wilk normality test

data:  resid(output)
W = 0.99083, p-value = 0.0003219

2. 등분산 검정

var.test(age ~ cardiogenicShock, data=df)

	F test to compare two variances

data:  age by cardiogenicShock
F = 1.6109, num df = 645, denom df = 30, p-value = 0.11
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.8939231 2.5597701
sample estimates:
ratio of variances 
          1.610921

1. t-검정

t.test(age ~ cardiogenicShock, data=df, var.equal=T)

	Two Sample t-test

data:  age by cardiogenicShock
t = 0.22149, df = 675, p-value = 0.8248
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.799331  4.765475
sample estimates:
 mean in group No mean in group Yes 
         63.09598          62.61290

2. 웰치의 검정

t.test(age ~ cardiogenicShock, data=df, var.equal=F)

	Welch Two Sample t-test

data:  age by cardiogenicShock
t = 0.27492, df = 34.808, p-value = 0.785
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.084807  4.050951
sample estimates:
 mean in group No mean in group Yes 
         63.09598          62.61290

3. 윌콕슨 순위합 검정 (Wilcox rank-sum test)

wilcox.test(age ~ cardiogenicShock, data=df)

	Wilcoxon rank sum test with continuity correction

data:  age by cardiogenicShock
W = 10361, p-value = 0.7438
alternative hypothesis: true location shift is not equal to 0

4. Boxplot

library(ggplot2)

fig <- function(width, heigth){
     options(repr.plot.width = width, repr.plot.height = heigth)
}

fig(4, 4)

ggplot(df) +
  aes(x = cardiogenicShock, y = age) +
  geom_boxplot(shape = "circle", fill = "#4682B4") +
  labs(x = "Cardiogenic Shock", y = "Age") +
  theme_minimal()

5. 세 그룹 이상의 평균 비교

0. 검정

1. 정규성 검정

out = aov(LDLC ~ Dx, data=df)
shapiro.test(resid(out))

	Shapiro-Wilk normality test

data:  resid(out)
W = 0.96866, p-value = 7.479e-11

2. 등분산 검정

bartlett.test(LDLC ~ Dx, data=df)

	Bartlett test of homogeneity of variances

data:  LDLC by Dx
Bartlett's K-squared = 4.6984, df = 2, p-value = 0.09545

1. 분산분석(ANOVA)를 통한 그룹 간의 평균 비교

out = aov(LDLC ~ Dx, data=df)
summary(out)

             Df  Sum Sq Mean Sq F value  Pr(>F)   
Dx            2   18525    9263   5.649 0.00369 **
Residuals   674 1105097    1640                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

사후 검정 (mutiple comparison)

TukeyHSD(out)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = LDLC ~ Dx, data = df)

$Dx
                             diff       lwr       upr     p adj
STEMI-NSTEMI           -11.632006 -21.88366 -1.380357 0.0214679
Unstable Angina-NSTEMI -14.071523 -24.05755 -4.085496 0.0028229
Unstable Angina-STEMI   -2.439517 -10.60690  5.727869 0.7626228

2. 웰치의 ANOVA를 이용한 그룹 간의 비교

oneway.test(LDLC ~ Dx, data=df, var.equal=F)

	One-way analysis of means (not assuming equal variances)

data:  LDLC and Dx
F = 4.7883, num df = 2.00, denom df = 334.18, p-value = 0.008907

사후 검정

games.howell <- function(grp, obs) {
  
  #Create combinations
  combs <- combn(unique(grp), 2)
  
  # Statistics that will be used throughout the calculations:
  # n = sample size of each group
  # groups = number of groups in data
  # Mean = means of each group sample
  # std = variance of each group sample
  n <- tapply(obs, grp, length)
  groups <- length(tapply(obs, grp, length))
  Mean <- tapply(obs, grp, mean)
  std <- tapply(obs, grp, var)
  
  statistics <- lapply(1:ncol(combs), function(x) {
    
    mean.diff <- Mean[combs[2,x]] - Mean[combs[1,x]]
    
    #t-values
    t <- abs(Mean[combs[1,x]] - Mean[combs[2,x]]) / sqrt((std[combs[1,x]] / n[combs[1,x]]) + (std[combs[2,x]] / n[combs[2,x]]))
    
    # Degrees of Freedom
    df <- (std[combs[1,x]] / n[combs[1,x]] + std[combs[2,x]] / n[combs[2,x]])^2 / # Numerator Degrees of Freedom
      ((std[combs[1,x]] / n[combs[1,x]])^2 / (n[combs[1,x]] - 1) + # Part 1 of Denominator Degrees of Freedom 
         (std[combs[2,x]] / n[combs[2,x]])^2 / (n[combs[2,x]] - 1)) # Part 2 of Denominator Degrees of Freedom
    
    #p-values
    p <- ptukey(t * sqrt(2), groups, df, lower.tail = FALSE)
    
    # Sigma standard error
    se <- sqrt(0.5 * (std[combs[1,x]] / n[combs[1,x]] + std[combs[2,x]] / n[combs[2,x]]))
    
    # Upper Confidence Limit
    upper.conf <- lapply(1:ncol(combs), function(x) {
      mean.diff + qtukey(p = 0.95, nmeans = groups, df = df) * se
    })[[1]]
    
    # Lower Confidence Limit
    lower.conf <- lapply(1:ncol(combs), function(x) {
      mean.diff - qtukey(p = 0.95, nmeans = groups, df = df) * se
    })[[1]]
    
    # Group Combinations
    grp.comb <- paste(combs[1,x], ':', combs[2,x])
    
    # Collect all statistics into list
    stats <- list(grp.comb, mean.diff, se, t, df, p, upper.conf, lower.conf)
  })
  
    # Unlist statistics collected earlier
    stats.unlisted <- lapply(statistics, function(x) {
      unlist(x)
    })
  
    # Create dataframe from flattened list
    results <- data.frame(matrix(unlist(stats.unlisted), nrow = length(stats.unlisted), byrow=TRUE))
  
    # Select columns set as factors that should be numeric and change with as.numeric
    results[c(2, 3:ncol(results))] <- round(as.numeric(as.matrix(results[c(2, 3:ncol(results))])), digits = 3)
  
    # Rename data frame columns
    colnames(results) <- c('groups', 'Mean Difference', 'Standard Error', 't', 'df', 'p', 'upper limit', 'lower limit')
  
    return(results)
  }

with(df, games.howell(Dx, LDLC))

groups	Mean Difference	Standard Error	t	df	p	upper limit	lower limit
STEMI : NSTEMI	11.632	3.268	2.517	229.624	0.033	22.533	0.731
STEMI : Unstable Angina	-2.440	2.379	0.725	536.857	0.749	5.466	-10.345
NSTEMI : Unstable Angina	-14.072	3.238	3.073	225.415	0.007	-3.268	-24.875

3. 크루스컬-왈리스 H 검정 (Kruskal-Wallis rank sum test)

kruskal.test(LDLC ~ Dx, data=df)

	Kruskal-Wallis rank sum test

data:  LDLC by Dx
Kruskal-Wallis chi-squared = 8.9643, df = 2, p-value = 0.01131

사후 검정

library(nparcomp)

Loading required package: multcomp
Loading required package: mvtnorm
Loading required package: survival
Loading required package: TH.data
Loading required package: MASS

Attaching package: ‘TH.data’

The following object is masked from ‘package:MASS’:

    geyser

result = mctp(LDLC ~ Dx, data=df)
summary(result)

 #----------------Nonparametric Multiple Comparisons for relative effects---------------# 
 
 - Alternative Hypothesis:  True differences of relative effects are not equal to 0 
 - Estimation Method:  Global Pseudo Ranks 
 - Type of Contrast : Tukey 
 - Confidence Level: 95 % 
 - Method = Fisher with 278 DF 
 
 #--------------------------------------------------------------------------------------# 
 

 #----------------Nonparametric Multiple Comparisons for relative effects---------------# 
 
 - Alternative Hypothesis:  True differences of relative effects are not equal to 0 
 - Estimation Method: Global Pseudo ranks 
 - Type of Contrast : Tukey 
 - Confidence Level: 95 % 
 - Method = Fisher with 278 DF 
 
 #--------------------------------------------------------------------------------------# 
 
 #----Data Info-------------------------------------------------------------------------# 
           Sample Size    Effect     Lower     Upper
1          NSTEMI  131 0.5500590 0.5204616 0.5793063
2           STEMI  251 0.4905976 0.4648875 0.5163576
3 Unstable Angina  295 0.4593433 0.4345728 0.4843164

 #----Contrast--------------------------------------------------------------------------# 
       1  2 3
2 - 1 -1  1 0
3 - 1 -1  0 1
3 - 2  0 -1 1

 #----Analysis--------------------------------------------------------------------------# 
      Estimator  Lower  Upper Statistic   p.Value
2 - 1    -0.059 -0.130  0.011    -1.974 0.1196648
3 - 1    -0.091 -0.159 -0.022    -3.087 0.0061829
3 - 2    -0.031 -0.090  0.028    -1.247 0.4252763

 #----Overall---------------------------------------------------------------------------# 
  Quantile   p.Value
1 2.353006 0.0061829

 #--------------------------------------------------------------------------------------#

4. Boxplot 그래프

fig <- function(width, heigth){
     options(repr.plot.width = width, repr.plot.height = heigth)
}

fig(4, 4)

ggplot(df) +
  aes(x = Dx, y = LDLC) +
  geom_boxplot(shape = "circle", fill = "#F2B091") +
  theme_minimal()

6. 그룹 간의 비율 비교

기대 도수(expected values)가 5 이하인 셀이 전체 셀의 20% 이상이면 피셔의 정확한 검정을 시행
M x 2 표에서 M이 3이상이면서 서열이 있다면 코크란-아미티지 서열 검정 시행

1. Chi square and Fisher's exact test

library(gmodels)
with(df, 
     CrossTable(Dx, cardiogenicShock, chisq = T, fisher = T, expected = T, sresid = T, format = "SPSS"))

   Cell Contents
|-------------------------|
|                   Count |
|         Expected Values |
| Chi-square contribution |
|             Row Percent |
|          Column Percent |
|           Total Percent |
|            Std Residual |
|-------------------------|

Total Observations in Table:  677 

                | cardiogenicShock 
             Dx |       No  |      Yes  | Row Total | 
----------------|-----------|-----------|-----------|
         NSTEMI |      130  |        1  |      131  | 
                |  125.001  |    5.999  |           | 
                |    0.200  |    4.165  |           | 
                |   99.237% |    0.763% |   19.350% | 
                |   20.124% |    3.226% |           | 
                |   19.202% |    0.148% |           | 
                |    0.447  |   -2.041  |           | 
----------------|-----------|-----------|-----------|
          STEMI |      221  |       30  |      251  | 
                |  239.507  |   11.493  |           | 
                |    1.430  |   29.799  |           | 
                |   88.048% |   11.952% |   37.075% | 
                |   34.211% |   96.774% |           | 
                |   32.644% |    4.431% |           | 
                |   -1.196  |    5.459  |           | 
----------------|-----------|-----------|-----------|
Unstable Angina |      295  |        0  |      295  | 
                |  281.492  |   13.508  |           | 
                |    0.648  |   13.508  |           | 
                |  100.000% |    0.000% |   43.575% | 
                |   45.666% |    0.000% |           | 
                |   43.575% |    0.000% |           | 
                |    0.805  |   -3.675  |           | 
----------------|-----------|-----------|-----------|
   Column Total |      646  |       31  |      677  | 
                |   95.421% |    4.579% |           | 
----------------|-----------|-----------|-----------|

 
Statistics for All Table Factors


Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  49.75095     d.f. =  2     p =  1.572966e-11 


 
Fisher's Exact Test for Count Data
------------------------------------------------------------
Alternative hypothesis: two.sided
p =  1.206607e-12 

 
       Minimum expected frequency: 5.998523

2. Cochran-Armitage trend test

library(DescTools)

df$Dx_arr <- factor(df$Dx, levels=c("Unstable Angina", "NSTEMI", "STEMI"))

table <- with(df, 
              table(Dx_arr, cardiogenicShock))
table <- addmargins(table)
table

	No	Yes	Sum
Unstable Angina	295	0	295
NSTEMI	130	1	131
STEMI	221	30	251
Sum	646	31	677

CochranArmitageTest(table)

Error: Cochran-Armitage test for trend must be used with rx2-table
Traceback:


1. CochranArmitageTest(table)

2. stop("Cochran-Armitage test for trend must be used with rx2-table", 
 .     call. = FALSE)

3. 모자이크 그래프

Author And Source

이 문제에 관하여(R 코드 정리), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@shlee-ns/R-코드-정리

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

pybitcointools 소스 코드 분석의 비트 코 인 거래 데이터 구조

spring data jpa bug 분석 - 초기 화 시 EntityManager 가 닫 히 지 않 는 이유

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다