[Getting and Cleaning data] Quiz 2
5475 단어 Rstatisticscourseradatascience
For more detail, see the html file here.
Question 1
Register an application with the Github API here github application. Access the API to get information on your instructors repositories(target url) . Use this data to find the time that the datasharing repo was created. What time was it created? This tutorial may be useful help tutorial. You may also need to run the code in the base R package and not R studio.
library(httr)
library(httpuv)
# 1.OAuth settings for github:
Client_ID '66fba4580b9b23531d6e'
Client_Secret '7fd8a4f7d72ab12b6c01b5c4880bc6da7723eec2'
myapp "First APP", key = Client_ID, secret = Client_Secret)
# 2. Get OAuth credentials
github_token .0_token(oauth_endpoints("github"), myapp)
# 3. Use API
gtoken "https://api.github.com/users/jtleek/repos", gtoken)
stop_for_status(req)
# 4. Extract out the content from the request
json1 = content(req)
# 5. convert the list to json
json2 = jsonlite::fromJSON(jsonlite::toJSON(json1))
# 6. Result
json2[json2$full_name == "jtleek/datasharing", ]$created_at
Question 2
The
sqldf
package allows for execution of SQL commands on R data frames. We will use the sqldf
package to practice the queries we might send with the dbSendQuery command in RMySQL. Download the American Community Survey data and load it into an R object called acs(data website), Which of the following commands will select only the data for the probability weights pwgtp1 with ages less than 50? sqldf("select * from acs where AGEP < 50")
sqldf("select * from acs")
sqldf("select pwgtp1 from acs")
sqldf("select pwgtp1 from acs where AGEP < 50")
# load package: sqldf is short for SQL select for data frame.
library(sqldf)
# 1. download data
download.file(url = "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv", destfile = "./data/acs.csv")
# 2. read data
acs "./data/acs.csv")
# 3. select using sqldf
#sqldf("select pwgtp1 from acs where AGEP<50", drv='SQLite')
Question 3
Using the same data frame you created in the previous problem, what is the equivalent function to unique(acs$AGEP)
sqldf("select unique AGEP from acs")
sqldf("select distinct pwgtp1 from acs")
sqldf("select AGEP where unique from acs")
sqldf("select distinct AGEP from acs")
result "select distinct AGEP from acs", drv = "SQLite")
nrow(result)
length(unique(acs$AGEP))
Question 4
How many characters are in the 10th, 20th, 30th and 100th lines of HTML from this page: target page.(Hint: the nchar() function in R may be helpful)
# 1. set url
url "http://biostat.jhsph.edu/~jleek/contact.html")
# 2. read content from url
content # 3. result
nchar(content[c(10, 20, 30, 100)])
Question 5
Read this data set into R and report the sum of the numbers in the fourth column data web. Original source of the data: original data web (Hint this is a fixed width file format)
# 1. read data
data "https://d396qusza40orc.cloudfront.net/getdata%2Fwksst8110.for",
skip = 4,
widths = c(12, 7,4, 9,4, 9,4, 9,4))
# 2. result
sum(as.numeric(data[,4]))
이 내용에 흥미가 있습니까?
현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:
언어 - Chord diagram 그리기이 팁에서는 데이터 간의 상관 관계를 시각화하는 방법 중 하나 인 Chord diagram을 그리는 방법을 소개합니다. 그리려면 "chorddiag"패키지를 사용합니다. 다음 데이터를 파일 이름 "yamate-lin...
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.