R 베이스

1. 현재 작업 디렉터리 보기

getwd()

2. 카탈로그 변경

setwd ("~/Downloads") attention: 파일을 읽을 수 없을 때 디렉터리가 파일이 저장된 위치가 아니기 때문입니다 error: cannot open file 'data': no such file or directory

3. 파일 읽기

statesInfo

4. 테이블의 데이터를 찾는다

2. statesInfo[rows,columns]
statesInfo[statesInfo$state.region==1, ]```

5. 데이터 세트의 앞 두 줄과 인쇄 데이터의 크기

head(data,2) dim(data)

6. 데이터 정보를 확인한다.단축키 option+cmd+I

?cars
str(cars)```

7. 데이터 찾기

subset(data,mpg>=30|hp<60)

8. 변수를 표로 만들고 각 조의 인원수를 관찰한다.

table(data$employment.status)#"mployment.status" ： group by

9. 통계 보기

summary(reddit)

10、변수의 단계를 보기

levels(data$columns)

11. 직사각형을 그립니다. 그림이 plots에 나타나지 않으면 dev. off () 를 실행해야 합니다.

library(ggplot2) qplot(data=reddit,x=age.range)# reddit age.range （

12. 직사각형 배열 위의 단계 정렬

reddit$age.range

13. 직사각형을 그리는 두 가지 방식

>#  
>qplot(x = dob_day, data = pf)+
  scale_x_continuous(breaks = 1:31)
>#  
>ggplot(aes(x = dob_day), data = pf) +
  geom_histogram(binwidth = 0.5) +
  scale_x_continuous(breaks = 1:31)#binwidth bins ， 
>## bins
>ggplot(aes(price),data = diamonds)+
  geom_histogram(bins = 300)+
  scale_x_log10()
>## 
>ggplot(aes(price),data = diamonds)+
  geom_histogram(bins = 300)+
  scale_x_log10()

 -  ， binwidth， ， bins( )，

14. 직사각형은 열별로 dobmonth의 값은 각각 그림을 그려서, 각각의 종류 변수에 같은 종류의 도형을 만듭니다

ggplot(aes(x = dob_day), data = pf) +
  geom_histogram(binwidth = 0.5) +
  scale_x_continuous(breaks = 1:31)+
  facet_wrap(~dob_month,ncol=3)#ncol 
 facet_wrap(formula) facet_wrap(~variable 
facet_grid(formula)   facet_grid(vertical~horizontal)

15. 직사각형 x축의 기점 위치와 종점 위치를 설정한다.

qplot(data=pf,x=friend_count,xlim=c(0,1000))# x 
qplot(x=friend_count,data=pf)+
    scale_x_continuous(limits=c(0,1000))#

16.na값을 무시하고,

qplot(x=friend_count,data=subset(pf,!is.na(gender)),binwidth=25)+
 scale_x_continuous(limits=c(0,1000),breaks=seq(0,1000,25))+
  facet_wrap(~gender,ncol=2)
1.subset(pf,!is.na(gender)) gender na ;
2.binwidth ；
3.scale_x_continuous(limits=c(0,1000) ， X ；
4.breaks 0-1000 ， 25；5.facet_wrap(~gender,ncol=2) gender

17. 통계 보기

table(pf$gender) # 데이터 집합 pf에서gender 필드의 값이 얼마나 되는지 보기

18. 통계치 보기

by(pf$friend count,pf$gender,summary)gender 열별 friend 보기count값 통계

19. 직사각형의 색상 설정

qplot(x=tenure,data=pf,binwidth=30,
      color=I('black'),fill=I('#099DD9'))

ggplot(aes(x=price,fill=cut),data=diamonds)+
  geom_histogram()+
  facet_wrap(~color)+
  scale_x_log10()+
  scale_fill_brewer(type="qual")
1. fill=cut

20、

qplot(x = age,data=pf,binwidth=1,
      color=I('black'),fill=I('#099DD9'))+
  scale_x_continuous(breaks= seq(0,113,5))
#scale_x_continuous  X

21. 변수 추출 대수를 정적 분포로 전환

##summary(log10(pf$friend_count+1))
 10 ， 
qplot(x=(price/carat+1),data=diamonds,binwidth=50)+
  facet_wrap(~cut)+
  scale_x_log10()

22. 대수, 제곱근 그림 그리기

>p1

23. 주파수 다각형 만들기

##ggplot(aes(x = friend_count, y = ..count../sum(..count..)), data = subset(pf, !is.na(gender))) + 
  geom_freqpoly(aes(color = gender), binwidth=10) + 
  scale_x_continuous(limits = c(0, 1000), breaks = seq(0, 1000, 50)) + 
  xlab(' ') + 
  ylab('Percentage of users with that friend count')

24. 변수 중의 분류의 합을 구한다

by(pf$www_likes,pf$gender,sum)

25. 박스 라인 그림

qplot(x=gender,y=friend_count,
      data=subset(pf,!is.na(gender)),
      geom='boxplot',ylim=c(0,1000))```


qplot(x=gender,y=friend_count,
      data=subset(pf,!is.na(gender)),
      geom='boxplot')+
  scale_y_continuous(limits=c(0,1000)) ```

  
qplot(x=gender,y=friend_count,
      data=subset(pf,!is.na(gender)),
      geom='boxplot')+
  coord_cartesian(ylim= c(0,1000))
# y 0-1000

26.ifelse와 요소 변수로 전환

mobile_check_in 0,1,0)
pf$mobile_check_in

27. 두 연속 변수 간의 관계를 연구하고 산점도를 만든다

ggplot(aes(x=age,y=friend_count),data=pf)+
  geom_point(alpha=1/20)+
  xlim(13,90)+
  coord_trans(y='sqrt')
#geom_point alpha 20 ，coord_trans y log10

ggplot(aes(x=age,y=friend_count),data=subset(pf,!is.na(gender)))+
  geom_jitter(alpha=1/20,aes(colour=gender),height=0)+
  xlim(13,90)+
  coord_trans(y='sqrt',limy=c(0,3000))
#geom_jitter ， ， ， 0

ggplot(aes(x=table,y=price),data=diamonds)+
  geom_point(alpha=1/5,aes(color=cut))+
  scale_x_continuous(breaks=seq(50,80,2))

28. 데이터를 그룹으로 나누고 각 그룹의 평균값, 중위수를 취한다.

age_groups

29. 산점도에 중위수, 분위수 도층을 추가한다

ggplot(aes(x=age,y=friend_count),data=pf)+
  coord_cartesian(xlim=c(13,70),ylim=c(0,1000))+
  geom_point(alpha=0.05,
             position=position_jitter(h=0),
             color='orange')+
  #coord_trans(y='sqrt')+
  geom_line(stat='summary',fun.y=mean)+
  geom_line(stat="summary",fun.y=quantile,fun.args=list(probs= .9),
            linetype=2,color='blue')+
  geom_line(stat="summary",fun.y=quantile,fun.args=list(probs= .5),
            linetype=2,color='blue')+
  geom_line(stat="summary",fun.y=quantile,fun.args=list(probs= .1),
            linetype=2,color='blue')
1. coord_cartesian x、y 
2. alpha=1/20  20 ；
2.1：position=position_jitter(h=0) ， 0
3.coord_trans(y='sqrt') y ；
4. y  geom_line(stat='summary',fun.y=mean)；
5. y geom_line(stat="summary",fun.y=quantile,fun.args=list(probs= .5),
            linetype=2,color='blue')

30. 두 변수의 상관계수 보기


with(pf,cor.test(age,friend_count))

with(subset(pf,age<=70),cor.test(age,friend_count,
                                 method="pearson"))
1.pearson  （default）

31. 산점도+상관계수도+구분 데이터 서브집합

ggplot(aes(x=www_likes_received,y=likes_received),data=pf)+
  geom_point()+
  xlim(0,quantile(pf$www_likes_received,0.95))+
  ylim(0,quantile(pf$likes_received,0.95))+
  geom_smooth(method='lm',color='red')
1. x\y xlim\ylim  95% ；
2. ：geom_smooth

32. 부드러운 데이터

p1

더 많은 변수 분석

library(dplyr)
pf.fc_age_gender%
  filter(!is.na(gender))%>%
  group_by(age,gender)%>%
  summarise(mean_friend_count=mean(friend_count),
            median_friend_count=median(friend_count),
            n=n())%>%
  ungroup()%>%
  arrange(age)
1. 、 、 ；
names(pf.fc_age_gender)
ggplot(aes(x=age,y=mean_friend_count),
       data=pf.fc_age_gender)+
  geom_line(aes(color=gender))
2.

2. 긴 형식을 넓은 형식으로 변환

pf.fc_age_gender%
  filter(!is.na(gender))%>%
  group_by(age,gender)%>%
  summarise(mean_friend_count=mean(friend_count),
            median_friend_count=median(friend_count),
            n=n())%>%
  ungroup()%>%
  arrange(age)

library(reshape2)
pf.fc_by_age_gender.wide

3. 비율도

ggplot(aes(x=age,y=female/male),
       data=pf.fc_by_age_gender.wide)+
  geom_line()+
  geom_hline(yintercept=1,alpha=0.3,linetype=2)

4. 변수 함수 절단cut

pf$year_joined.bucket

5. 가입 시간과 친구 수의 관계 탐색

ggplot(aes(x=age,y=friend_count),
     data=subset(pf,!is.na(year_joined.bucket)))+
geom_line(aes(color=year_joined.bucket),stat="summary",fun.y=mean)+
geom_line(stat="summary",fun.y=mean,linetype=2)
1.geom_line(aes(color=year_joined.bucket),stat="summary",fun.y=mean) year_joined.bucket) 
2. geom_line(stat="summary",fun.y=mean,linetype=2)

6. 우정과 사용 시간의 관계를 구축하는 선도

이용해야 할 변수는 나이, 사용 시간, 쌓은 우정과 yearjoined.bucket

ggplot(aes(x=30*round(tenure/30),y=friendships_initiated/tenure),
       data=subset(pf,tenure>=1))+
  geom_line(aes(color=year_joined.bucket),stat="summary",fun.y=mean)
1.30*round(tenure/30) ， 

ggplot(aes(x=30*round(tenure/30),y=friendships_initiated/tenure),
       data=subset(pf,tenure>=1))+
  geom_smooth(aes(color=year_joined.bucket))
geom_smooth

7. 분산도 행렬 만들기


library(GGally)
theme_set(theme_minimal(20))
set.seed(1836)
pf_subset

분면과 색을 띤 가격 직사각형

ggplot(aes(x=price,fill=cut),data=diamonds)+
  geom_histogram()+
  facet_wrap(~color)+
  scale_x_log10()+
  scale_fill_brewer(type="qual")

8. 산점도를 그리고 99%의 데이터를 선택하며 어떤 이산 변수에 따라 분류하고 축log10

diamonds$volume

9. 산점도, X축의 입방근을 취하고 Y축의 log10을 취한다

ggplot(aes(carat,price),data=diamonds)+
  geom_point()+
  scale_x_continuous(trans=cuberoot_trans(),limits=c(0.2,3),
                     breaks=c(0.2,0.5,1,2,3))+
  scale_y_continuous(trans=log10_trans(),limits=c(350,15000),
                     breaks=c(350,1000,5000,10000,15000))+
  ggtitle('Price(log10) by Cube-Root of Carat')

ggplot(aes(carat, price), data = diamonds) + 
  geom_point(alpha=0.5,position='jitter',size=0.75) + 
  scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
                     breaks = c(0.2, 0.5, 1, 2, 3)) + 
  scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
                     breaks = c(350, 1000, 5000, 10000, 15000)) +
  ggtitle('Price (log10) by Cube-Root of Carat')        

library('RColorBrewer')
ggplot(aes(x = carat, y = price,colour=clarity), data = diamonds) + 
  geom_point(alpha = 0.5, size = 1, position = 'jitter') +
  scale_color_brewer(type = 'div',
                     guide = guide_legend(title = 'Clarity', reverse = T,
                                          override.aes = list(alpha = 1, size = 2))) +  
  scale_x_continuous(trans = cuberoot_trans(), limits = c(0.2, 3),
                     breaks = c(0.2, 0.5, 1, 2, 3)) + 
  scale_y_continuous(trans = log10_trans(), limits = c(350, 15000),
                     breaks = c(350, 1000, 5000, 10000, 15000)) +
  ggtitle('Price (log10) by Cube-Root of Carat and Clarity')
1. -colour=clarity
2. ，

선형 모델 생성 및 예측

 menisc 

bigdiamonds$logprice=log(bigdiamonds$price)
 ， 5 
m1

글꼴 크기 및 수평 위치 조정

theme(axis.title.x=element_text(size=60),axis.title.y=element_text(size=60))+theme(plot.title=element_text(hjust=0.5))

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

다양한 언어의 JSON

JSON은 Javascript 표기법을 사용하여 데이터 구조를 레이아웃하는 데이터 형식입니다. 그러나 Javascript가 코드에서 이러한 구조를 나타낼 수 있는 유일한 언어는 아닙니다. 저는 일반적으로 '객체'{}...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

R 베이스

R 베이스

1. 현재 작업 디렉터리 보기

2. 카탈로그 변경

3. 파일 읽기

4. 테이블의 데이터를 찾는다

5. 데이터 세트의 앞 두 줄과 인쇄 데이터의 크기

6. 데이터 정보를 확인한다.단축키 option+cmd+I

7. 데이터 찾기

8. 변수를 표로 만들고 각 조의 인원수를 관찰한다.

9. 통계 보기

10、변수의 단계를 보기

11. 직사각형을 그립니다. 그림이 plots에 나타나지 않으면 dev. off () 를 실행해야 합니다.

12. 직사각형 배열 위의 단계 정렬

13. 직사각형을 그리는 두 가지 방식

14. 직사각형은 열별로 dobmonth의 값은 각각 그림을 그려서, 각각의 종류 변수에 같은 종류의 도형을 만듭니다

15. 직사각형 x축의 기점 위치와 종점 위치를 설정한다.

16.na값을 무시하고,

17. 통계 보기

18. 통계치 보기

19. 직사각형의 색상 설정

20、

21. 변수 추출 대수를 정적 분포로 전환

22. 대수, 제곱근 그림 그리기

23. 주파수 다각형 만들기

24. 변수 중의 분류의 합을 구한다

25. 박스 라인 그림

26.ifelse와 요소 변수로 전환

27. 두 연속 변수 간의 관계를 연구하고 산점도를 만든다

28. 데이터를 그룹으로 나누고 각 그룹의 평균값, 중위수를 취한다.

29. 산점도에 중위수, 분위수 도층을 추가한다

30. 두 변수의 상관계수 보기

31. 산점도+상관계수도+구분 데이터 서브집합

32. 부드러운 데이터

더 많은 변수 분석

2. 긴 형식을 넓은 형식으로 변환

3. 비율도

4. 변수 함수 절단cut

5. 가입 시간과 친구 수의 관계 탐색

6. 우정과 사용 시간의 관계를 구축하는 선도

이용해야 할 변수는 나이, 사용 시간, 쌓은 우정과 yearjoined.bucket

7. 분산도 행렬 만들기

분면과 색을 띤 가격 직사각형

8. 산점도를 그리고 99%의 데이터를 선택하며 어떤 이산 변수에 따라 분류하고 축log10

9. 산점도, X축의 입방근을 취하고 Y축의 log10을 취한다

선형 모델 생성 및 예측

글꼴 크기 및 수평 위치 조정

좋은 웹페이지 즐겨찾기