[Getting and Cleaning data] Quiz 4
6243 단어 statisticsRcourseradatascience
More details can be found in the html file here.
Question 1
The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here. And load the data into R. The code book, describing the variable names is here. Apply strsplit() to split all the names of the data frame on the characters “wgtp”. What is the value of the 123 element of the resulting list?
# download data
if(!file.exists("./data")) dir.create("./data")
fileUrl "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(fileUrl, destfile = "./data/ACS.csv")
# load data into R
acs "./data/ACS.csv")
# split data name.
strsplit(names(acs), split = "wgtp")[[123]]
Question 2
Load the Gross Domestic Product data for the 190 ranked countries in this data set here. Remove the commas from the GDP numbers in millions of dollars and average them. What is the average? Original data sources is here.
fileUrl "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
download.file(fileUrl, destfile = "./data/GDP.csv")
gdp "./data/GDP.csv", skip = 4, nrows = 190, stringsAsFactors = FALSE)[,c(1, 2, 4, 5)]
colnames(gdp) = c("CountryCode", "Ranking", "Economy", "GDP")
obj ",", "", gdp$GDP)
obj as.numeric(obj)
mean(obj, na.rm = TRUE)
Question 3
In the data set from Question 2 what is a regular expression that would allow you to count the number of countries whose name begins with “United”? Assume that the variable with the country names in it is named countryNames. How many countries begin with United?
grep("^United",countryNames)
, 3 grep("United$",countryNames)
, 3 grep("^United",countryNames)
, 4 grep("*United",countryNames)
, 2 grep("^United", gdp$Economy)
Question 4
Load the Gross Domestic Product data for the 190 ranked countries in this data set here.
Load the educational data from this data set here. Match the data based on the country shortcode. Of the countries for which the end of the fiscal year is available, how many end in June?
Original data sources are here and here.
# download data
fileUrl1 "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
fileUrl2 "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv"
download.file(fileUrl1, destfile = "./data/GDP.csv")
download.file(fileUrl2, destfile = "./data/EDU.csv")
# load data into R
gdp "./data/GDP.csv", skip = 4, nrow = 190, stringsAsFactors = FALSE)[,c(1,2,4,5)]
colnames(gdp) = c("CountryCode", "Ranking", "Economy", "GDP")
edu "./data/EDU.csv")
# merge data
mergeData "CountryCode")
# result
indexFiscal "fiscal", tolower(mergeData$Special.Notes))
sum(grepl("june", tolower(mergeData$Special.Notes[indexFiscal])))
Question 5
You can use the quantmod (http://www.quantmod.com/) package to get historical stock prices for publicly traded companies on the NASDAQ and NYSE. Use the following code to download data on Amazon’s stock price and get the times the data was sampled.
library(quantmod)
amzn = getSymbols("AMZN",auto.assign=FALSE)
sampleTimes = index(amzn)
How many values were collected in 2012? How many values were collected on Mondays in 2012?
# load data
library(lubridate)
library(quantmod)
amzn "AMZN", auto.assign=FALSE)
sampleTimes # result
index2012 "2012", sampleTimes)
length(index2012)
sum(grepl("Mon", wday(sampleTimes[index2012], label = TRUE)))
이 내용에 흥미가 있습니까?
현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:
SPSS Statistics 27에서 "효과량"출력최근의 학술논문에서는 실험에서 유의한 차이가 있는지 여부를 나타내는 p-값뿐만 아니라 그 차이에 얼마나 효과가 있는지를 나타내는 효과량의 제시가 요구되고 있다. 일반적으로 두 가지 차이점은 효과량을 계산할 때 분산을...
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.