2. Correlation

5728 단어 R
In this article, I check the correlation between sales and the number of visiting customers.
※If you have a data, you don't have to read contents1

Contents



1. Creation of simulation data
2. Visualization
3. Correlation

1. Creation of simulation data



I use the data used here . Moreover, I add data to examine the relationship between two variables. Of course, You can check the relationship between multiple variables with almost the same code)
> head(Data)
        time sales
1 2020-03-01     7
2 2020-03-02     4
3 2020-03-03    17
4 2020-03-04     2
5 2020-03-05     9
6 2020-03-06     9
> Data$number.of.customers <- rnbinom(nrow(Data), mu = 7, 0.8)
> head(Data)
        time sales number.of.customers
1 2020-03-01     7                   1
2 2020-03-02     4                  18
3 2020-03-03    17                   2
4 2020-03-04     2                   1
5 2020-03-05     9                   2
6 2020-03-06     9                  43

2. Visualization


> library(ggplot2)
> Data$time <- as.POSIXct(Data$time)
> ggplot(data=Data, aes(x=time))+
+   geom_line(aes(y=scale(sales), colour="black"), size=0.9, show.legend = T)+
+   geom_line(aes(y=scale(number.of.customers), colour="blue"), size=0.9, show.legend = T)+
+   labs(title="Comparison")+
+   ylab("sals/number.of.customers")+
+   scale_x_datetime(date_labels="%m/%d")+
+   scale_colour_manual(name='Legend',guide='legend' ,
+                       values = c("black"="black", "blue"= "blue"),
+                       labels=c('sales', 'number.of.customers'))



3. Correlation


> library(corrplot)
> library(gplots)
> corrplot.mixed(corr=cor(Data[,c(2,3)]), upper="ellipse")

좋은 웹페이지 즐겨찾기