Data Carpentry Workshop - Day 2 -Working with tabular data in R

2846 단어
IMG_7436.JPG

Main Contents

  • Describe what a data frame is.
  • Load external data from a .csv file into a data frame.
  • Summarize the contents of a data frame.
  • Describe what a factor is.
  • Convert between strings and factors.
  • Reorder and rename factors.
  • Change how character strings are handled in a data frame.
  • Format dates.

  • 1. Preparation for a dataframe


    1.1 Download file
    download.file("https://ndownloader.figshare.com/files/2292169",
       "C:/Users/home/Desktop/Rcourse/DataCarpentry33/RawData/portal_data_joined1.csv")
    

    1.2 Load file
    survey 

    2. Inspecting dataframe objects


    2.1 Contents
  • head(): shows the first 6 rows
  • tail(): shows thelast 6 rows

  • 2.2 Size
  • dim(): returns a vector with the number of rows in the first element, and the number of columns as the second element (the dimensions of the object)
  • nrow(): returns the number of rows
  • ncol(): returns the number of columns

  • 2.3 Names
  • names(): returns the column names (synonym of colnames() for data.frame objects)
  • colnames()
  • rownames(): returns the row names

  • 2.4 Summary
  • str(): structure of the object and information about the class, length and content of each column
  • summary(): summary statistics for each column

  • 3. Indexing and subsetting data frames


    3.1 Indexing for data
    v1 

    3.2 Exclude certain indices of a data frame using the “-” sign
    survey[,-1] # 
    survey[-c(7:34786),] # 6 , head(survey)
    

    3.3 Subsetting by calling indices or column names
    a 

    4. Factors


    4.1 Basic factors
    sex 

    4.2 Converting factors
    sex_t 

    Three steps to transfer into numberic:
  • We obtain all the factor levels using levels(year_fct).
  • We convert these levels to numeric values using as.numeric(levels(year_fct)).
  • We then access these numeric values using the underlying integers of the vector year_fct inside the square brackets.

  • 4.3 Renaming factors
    plot(survey$sex) #view 
    sex 

    4.4 Using stringsAsFactors = FALSE
    Compare the difference between read as "factors"vs "character"
    surveys1 

    Practice
    animal_data 

    4.5 Formatting Dates
    library(tidyverse)
    library(lubridate)
    my_date 

    다음 편 예고.


    Data Manipulation using dplyr and tidyr


    Data visualization with ggplot2

    좋은 웹페이지 즐겨찾기