[Getting and Cleaning data] Quiz 1
6854 단어 statisticsRcourseradatascience
For more detail, you can download the html file here.
Quiz 1
Question 1
The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here and load the data into R. The code book, describing the variable names is here. How many housing units in this survey were worth more than $1,000,000?
if(!file.exists("data")) dir.create("data")
fileUrl "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(fileUrl, destfile = "./data/United States communities.csv")
data read.csv("./data/United States communities.csv")
sum(data[!is.na(data$VAL),]$VAL == 24)
QUestion 2
Using the data from question 1. Consider the var FES in the codebook. Which of the “tidy data” principles does this variable violate?
Question 3
Download the Excel spreadsheet on Natural Gas Aquisition Program here. Read rows 18-23 and columns 7-15 into R and assign the result to a variable called “dat”. What is the value of:
sum(dat$Zip*dat$Ext,na.rm=T)
(original data source is here)
if(!file.exists("data")) dir.create("data")
library(xlsx)
fileUrl "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FDATA.gov_NGAP.xlsx"
download.file(fileUrl, destfile = "./data/Natural Gas Aquisition Program.xlsx", mode = "wb")
dateDownloaded date()
dat read.xlsx("./data/Natural Gas Aquisition Program.xlsx", sheetIndex = 1, rowIndex = 18:23, colIndex = 7:15, header = TRUE)
sum(dat$Zip*dat$Ext,na.rm=T)
Question 4
Read the XML data on Baltimore restaurants from [here]( https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml). How many restaurants have zipcode 21231?
if(!file.exists("data")) dir.create("data")
library(XML)
fileUrl "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
doc TRUE)
rootNode sum(xpathSApply(rootNode, "//zipcode", xmlValue) == "21231")
Question 5
The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here.Using the fread() command load the data into an R object DT Which of the following is the fastest way to calculate the average value of the variable pwgtp15 broken down by sex using the data.table package?
tapply(DT$pwgtp15,DT$SEX,mean)
DT[,mean(pwgtp15),by=SEX]
mean(DT[DT$SEX==1,]$pwgtp15); mean(DT[DT$SEX==2,]$pwgtp15)
mean(DT$pwgtp15,by=DT$SEX)
sapply(split(DT$pwgtp15,DT$SEX),mean)
rowMeans(DT)[DT$SEX==1]; rowMeans(DT)[DT$SEX==2]
# From slides, we can select the second one as solution. But here I will use systerm.time() function too see their time.
if(!file.exists("data")) dir.create("data")
fileUrl "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
download.file(fileUrl, destfile = "United States communities.csv")
library(data.table)
DT "United States communities.csv")
system.time(tapply(DT$pwgtp15, DT$SEX, mean))
system.time(DT[, mean(pwgtp15), by = SEX])
system.time(mean(DT[DT$SE == 1, ]$pwgtp15)) + system.time(DT[DT$SEX == 2, ]$pwgtp15)
system.time(mean(DT$pwgtp15, by = DT$SEX))
system.time(sapply(split(DT$pwgtp15,DT$SEX),mean))
#system.time(rowMeans(DT)[DT$SEX==1]) + system.time(rowMeans(DT)[DT$SEX==2]) this is a wrong anwser
이 내용에 흥미가 있습니까?
현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:
SPSS Statistics 27에서 "효과량"출력최근의 학술논문에서는 실험에서 유의한 차이가 있는지 여부를 나타내는 p-값뿐만 아니라 그 차이에 얼마나 효과가 있는지를 나타내는 효과량의 제시가 요구되고 있다. 일반적으로 두 가지 차이점은 효과량을 계산할 때 분산을...
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.