[공모전 수상작 리뷰] Reactjs+Nodejs+python+scikit-learn{ PCA(주성분 분석), VAR(다변량시계열분석)}으로 공연 예매 추이 시나리오 별 예측하는 서비스 만들어보기 - 데이터 분석 편(2)
데이터 분석을 하며 공부한 점
다양한 소스에서 데이터 수집
수집한 데이터를 목적에 맞게 전처리
데이터 모델링 및 모델 간 교차검증
다변량 시계열 분석 최종 모델 개발
데이터 모델링 및 모델 간 교차검증과정
#현재 가장 성능이 좋은 m9번 모델을 수행한 주피터노트북만이 실행창에 남아있음
#필요 라이브러리 로드
import numpy as np
import pandas as pd
import seaborn as sns
from statsmodels.stats.outliers_influence import variance_inflation_factor
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
import matplotlib
matplotlib.font_manager._rebuild()
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler,Normalizer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.decomposition import PCA
from statsmodels.tsa.api import VAR
from statsmodels.tsa.stattools import adfuller
sns.set(style='whitegrid')
pd.set_option('display.max_rows',500)
font_path = r'경로\\NanumFontSetup_TTF_GOTHIC.NanumGothic.ttf'
fontprop = fm.FontProperties(fname=font_path, size=18)
데이터 로드 및 기본적인 전처리 작업
기본적인 피처 설명
다양한 소스에서 데이터 수집
수집한 데이터를 목적에 맞게 전처리
데이터 모델링 및 모델 간 교차검증
다변량 시계열 분석 최종 모델 개발
#현재 가장 성능이 좋은 m9번 모델을 수행한 주피터노트북만이 실행창에 남아있음
#필요 라이브러리 로드
import numpy as np
import pandas as pd
import seaborn as sns
from statsmodels.stats.outliers_influence import variance_inflation_factor
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
import matplotlib
matplotlib.font_manager._rebuild()
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler,Normalizer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.decomposition import PCA
from statsmodels.tsa.api import VAR
from statsmodels.tsa.stattools import adfuller
sns.set(style='whitegrid')
pd.set_option('display.max_rows',500)
font_path = r'경로\\NanumFontSetup_TTF_GOTHIC.NanumGothic.ttf'
fontprop = fm.FontProperties(fname=font_path, size=18)
기간: 2019.01.01 ~ 2021.08.31
ott_user_count: OTT앱 일 별 사용자 수,
ott_usage_time: OTT앱 일 별 사용시간,
delivery_user_count: 배달앱 일 별 사용자 수,
delivery_usage_time: 배달앱 일 별 사용시간,
used_user_count: 중고거래앱 일 별 사용자 수,
used_usage_time: 중고거래앱 일 별 사용시간,
meeting_user_count: 화상회의앱 일 별 사용자 수,
meeting_usage_time: 화상회의앱 일 별 사용시간,
corona_count: 일 별 코로나 확진자 수,
subway_count: 일 별 지하철 이용자 수,
KOSPI_index: 일 별 코스피 지수,
KOSPI_trading: 일 별 코스피 시장 거래량,
KOSDAQ_index: 일 별 코스닥 지수,
KOSDAQ_trading: 일 별 코스닥 시장 거래량,
coin_trading: 일 별 가상화폐(비트코인+이더리움)거래량 평균,
coin_variance: 전 일 대비 일 별 가상화폐(비트코인+이더리움)등락률 평균,
#앞의 과정에서 전처리가 완료된 데이터 로드
df = pd.read_csv("경로\\201901_202108_종합통계_시계열분석용.csv")
df.drop('Unnamed: 0', axis=1, inplace=True)
df['corona_count'].fillna(0,inplace=True)
df['coin_trading'] = df['bitcoin_trading']+df['ethereum_trading']
df['coin_variance'] = (df['bitcoin_variance']+df['ethereum_variance'])/2
df.drop(['bitcoin_trading','ethereum_trading',
'bitcoin_variance','ethereum_variance'],axis=1,inplace=True)
df.index = df['date']
df_date = df['date']
df.drop(['date'],axis=1, inplace=True)
#로그스케일링 처리한 모델을 위해 가상화폐 데이터의 음수값을 전처리 함
# # 로그스케일링을 위해 coin_variance에 100을 더함(음수면 사용 불가)
# # 로그 스케일을 사용할때에만 사용
# df['coin_variance'] = df['coin_variance']+100
# for i in df.columns:
# df[i] = np.log1p(df[i])
X = df.iloc[:,1:]
y = df.iloc[:,0]
#StandardScaler 객체 생성
scaler = StandardScaler()
#StandardScaler로 데이터 셋 변환, fit()과 transform()호출
scaler.fit(X)
X_scaled = scaler.transform(X)
X_scaled = pd.DataFrame(data=X_scaled, columns=X.columns)
X_scaled.index = df_date
X_scaled
# #MinMaxScaler 객체 생성
# scaler = MinMaxScaler()
# #MinMaxScaler 데이터 셋 변환, fit()과 transform()호출
# scaler.fit(X)
# X_scaled = scaler.transform(X)
# X_scaled = pd.DataFrame(data=X_scaled, columns=X.columns)
# X_scaled.index = df_date
# X_scaled
# #Robust 객체 생성
# scaler = RobustScaler()
# #RobustScaler 데이터 셋 변환, fit()과 transform()호출
# scaler.fit(X)
# X_scaled = scaler.transform(X)
# X_scaled = pd.DataFrame(data=X_scaled, columns=X.columns)
# X_scaled.index = df_date
# X_scaled
X=X_scaled
#타겟 변수(공연 예매 건수)와 피처 결합
df = pd.merge(y, X,left_index=True, right_index=True,how='inner')
df
ticketing_count | ott_user_count | ott_usage_time | delivery_user_count | delivery_usage_time | used_user_count | used_usage_time | meeting_user_count | meeting_usage_time | corona_count | subway_count | KOSPI_index | KOSPI_trading | KOSDAQ_index | KOSDAQ_trading | coin_trading | coin_variance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||||||||
2019/01/01 | 7401 | -1.301964 | -1.126337 | -1.138730 | -0.879145 | -1.332576 | -1.373703 | -1.053405 | -0.843484 | -0.607122 | -1.546825 | -0.857302 | -1.152997 | -0.801456 | -1.333287 | -0.760446 | 0.894595 |
2019/01/02 | 5069 | -1.411287 | -1.360638 | -1.542213 | -1.360824 | -1.294828 | -1.363849 | -0.816423 | -0.804730 | -0.607122 | 0.758960 | -0.857302 | -1.152997 | -0.801456 | -1.333287 | -0.473632 | 1.209898 |
2019/01/03 | 6498 | -1.512255 | -1.380048 | -1.536694 | -1.377820 | -1.179579 | -1.346215 | -0.813926 | -0.802878 | -0.607122 | 0.908731 | -0.892203 | -0.903009 | -0.886813 | -1.155502 | -0.653263 | -0.830298 |
2019/01/04 | 7088 | -1.343318 | -1.318085 | -1.434606 | -1.294248 | -1.309262 | -1.362754 | -0.819952 | -0.804077 | -0.607122 | 1.091537 | -0.856767 | -0.947702 | -0.835184 | -1.310455 | -0.547525 | 0.442506 |
2019/01/05 | 18755 | -1.010367 | -0.963209 | -1.174507 | -0.999621 | -1.319738 | -1.330514 | -1.046767 | -0.836480 | -0.607122 | -0.153104 | -0.856767 | -0.947702 | -0.835184 | -1.310455 | -0.557694 | -0.100001 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021/08/27 | 19582 | 1.656060 | 1.539082 | 2.271937 | 2.151560 | 1.174091 | 0.778091 | 1.484776 | 1.743173 | 3.594865 | 1.533985 | 1.549169 | -0.705340 | 1.646165 | -0.178085 | -1.055345 | 1.133390 |
2021/08/28 | 45456 | 1.780755 | 1.632663 | 2.442208 | 2.699594 | 1.484143 | 1.128198 | -0.075720 | 0.000089 | 3.187087 | -0.002058 | 1.549169 | -0.705340 | 1.646165 | -0.178085 | -1.100312 | -0.239105 |
2021/08/29 | 31871 | 1.692069 | 2.029458 | 2.325626 | 2.782625 | 1.308575 | 1.197393 | 0.060223 | 0.137999 | 2.877739 | -0.921063 | 1.549169 | -0.705340 | 1.646165 | -0.178085 | -1.079082 | -0.199692 |
2021/08/30 | 3652 | 1.312763 | 1.266002 | 1.502044 | 1.133586 | 1.316112 | 0.847673 | 1.702709 | 1.957982 | 2.608230 | 1.558606 | 1.571202 | -0.509248 | 1.703738 | -0.361816 | -1.041119 | -0.505722 |
2021/08/31 | 8582 | 1.434144 | 1.282830 | 1.685858 | 1.606643 | 1.279019 | 0.879251 | 1.712669 | 1.997506 | 4.138569 | 1.394636 | 1.689138 | -0.386843 | 1.748593 | -0.262413 | -0.977654 | 0.676665 |
974 rows × 17 columns
다중공선성 확인 (VIF)
# X = df.iloc[:,1:]
# y = df.iloc[:,0]
vif = [variance_inflation_factor(X.values, i)for i in range(X.shape[1])]
result = sm.OLS(y,X).fit()
print(result.summary())
OLS Regression Results
=======================================================================================
Dep. Variable: ticketing_count R-squared (uncentered): 0.268
Model: OLS Adj. R-squared (uncentered): 0.256
Method: Least Squares F-statistic: 21.90
Date: Thu, 09 Sep 2021 Prob (F-statistic): 1.79e-54
Time: 03:47:25 Log-Likelihood: -11046.
No. Observations: 974 AIC: 2.212e+04
Df Residuals: 958 BIC: 2.220e+04
Df Model: 16
Covariance Type: nonrobust
=======================================================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------------
ott_user_count 1.607e+04 5420.509 2.965 0.003 5437.047 2.67e+04
ott_usage_time -7883.4350 5969.037 -1.321 0.187 -1.96e+04 3830.461
delivery_user_count 9444.5231 9008.048 1.048 0.295 -8233.260 2.71e+04
delivery_usage_time 9982.7230 7790.915 1.281 0.200 -5306.505 2.53e+04
used_user_count 1636.4530 5825.886 0.281 0.779 -9796.519 1.31e+04
used_usage_time -1.474e+04 6177.052 -2.386 0.017 -2.69e+04 -2617.060
meeting_user_count -2.108e+04 5093.297 -4.139 0.000 -3.11e+04 -1.11e+04
meeting_usage_time 1.768e+04 4625.185 3.822 0.000 8603.069 2.68e+04
corona_count -9564.6525 1633.899 -5.854 0.000 -1.28e+04 -6358.219
subway_count 3435.3489 1604.084 2.142 0.032 287.426 6583.272
KOSPI_index 6993.8696 3293.135 2.124 0.034 531.279 1.35e+04
KOSPI_trading -2268.0617 1209.873 -1.875 0.061 -4642.370 106.246
KOSDAQ_index -1.295e+04 2879.781 -4.496 0.000 -1.86e+04 -7297.072
KOSDAQ_trading 1672.7817 1409.311 1.187 0.236 -1092.912 4438.476
coin_trading -872.6250 881.917 -0.989 0.323 -2603.338 858.088
coin_variance -173.5325 661.886 -0.262 0.793 -1472.446 1125.380
==============================================================================
Omnibus: 474.550 Durbin-Watson: 0.356
Prob(Omnibus): 0.000 Jarque-Bera (JB): 4993.538
Skew: 1.969 Prob(JB): 0.00
Kurtosis: 13.370 Cond. No. 58.3
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
컬럼별로 순차적으로 삭제해가며 다중공선성 확인
df.drop(['delivery_user_count'],
axis=1, inplace=True)
df.drop(['coin_variance'],
axis=1, inplace=True)
# ,'used_user_count'
df.drop(['ott_usage_time'],
axis=1, inplace=True)
df.drop(['KOSPI_index'],
axis=1, inplace=True)
df.drop(['KOSDAQ_trading'],
axis=1, inplace=True)
df.drop(['KOSPI_trading'],
axis=1, inplace=True)
df.drop(['meeting_user_count'],
axis=1, inplace=True)
df.drop(['ott_user_count'],
axis=1, inplace=True)
df.drop(['KOSDAQ_index'],
axis=1, inplace=True)
df.drop(['meeting_usage_time'],
axis=1, inplace=True)
df.drop(['delivery_usage_time'],
axis=1, inplace=True)
# X = df.iloc[:,1:]
# y = df.iloc[:,0]
vif = pd.DataFrame()
vif['VIF Factor'] = [variance_inflation_factor(X.values, i)for i in range(X.shape[1])]
vif['features'] = X.columns
vif.round(1)
VIF Factor | features | |
---|---|---|
0 | 1.0 | p1 |
1 | 1.0 | p2 |
2 | 1.0 | p3 |
df
ticketing_count | ott_user_count | ott_usage_time | delivery_user_count | delivery_usage_time | used_user_count | used_usage_time | meeting_user_count | meeting_usage_time | corona_count | subway_count | KOSPI_index | KOSPI_trading | KOSDAQ_index | KOSDAQ_trading | coin_trading | coin_variance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||||||||
2019/01/01 | 7401 | -1.301964 | -1.126337 | -1.138730 | -0.879145 | -1.332576 | -1.373703 | -1.053405 | -0.843484 | -0.607122 | -1.546825 | -0.857302 | -1.152997 | -0.801456 | -1.333287 | -0.760446 | 0.894595 |
2019/01/02 | 5069 | -1.411287 | -1.360638 | -1.542213 | -1.360824 | -1.294828 | -1.363849 | -0.816423 | -0.804730 | -0.607122 | 0.758960 | -0.857302 | -1.152997 | -0.801456 | -1.333287 | -0.473632 | 1.209898 |
2019/01/03 | 6498 | -1.512255 | -1.380048 | -1.536694 | -1.377820 | -1.179579 | -1.346215 | -0.813926 | -0.802878 | -0.607122 | 0.908731 | -0.892203 | -0.903009 | -0.886813 | -1.155502 | -0.653263 | -0.830298 |
2019/01/04 | 7088 | -1.343318 | -1.318085 | -1.434606 | -1.294248 | -1.309262 | -1.362754 | -0.819952 | -0.804077 | -0.607122 | 1.091537 | -0.856767 | -0.947702 | -0.835184 | -1.310455 | -0.547525 | 0.442506 |
2019/01/05 | 18755 | -1.010367 | -0.963209 | -1.174507 | -0.999621 | -1.319738 | -1.330514 | -1.046767 | -0.836480 | -0.607122 | -0.153104 | -0.856767 | -0.947702 | -0.835184 | -1.310455 | -0.557694 | -0.100001 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021/08/27 | 19582 | 1.656060 | 1.539082 | 2.271937 | 2.151560 | 1.174091 | 0.778091 | 1.484776 | 1.743173 | 3.594865 | 1.533985 | 1.549169 | -0.705340 | 1.646165 | -0.178085 | -1.055345 | 1.133390 |
2021/08/28 | 45456 | 1.780755 | 1.632663 | 2.442208 | 2.699594 | 1.484143 | 1.128198 | -0.075720 | 0.000089 | 3.187087 | -0.002058 | 1.549169 | -0.705340 | 1.646165 | -0.178085 | -1.100312 | -0.239105 |
2021/08/29 | 31871 | 1.692069 | 2.029458 | 2.325626 | 2.782625 | 1.308575 | 1.197393 | 0.060223 | 0.137999 | 2.877739 | -0.921063 | 1.549169 | -0.705340 | 1.646165 | -0.178085 | -1.079082 | -0.199692 |
2021/08/30 | 3652 | 1.312763 | 1.266002 | 1.502044 | 1.133586 | 1.316112 | 0.847673 | 1.702709 | 1.957982 | 2.608230 | 1.558606 | 1.571202 | -0.509248 | 1.703738 | -0.361816 | -1.041119 | -0.505722 |
2021/08/31 | 8582 | 1.434144 | 1.282830 | 1.685858 | 1.606643 | 1.279019 | 0.879251 | 1.712669 | 1.997506 | 4.138569 | 1.394636 | 1.689138 | -0.386843 | 1.748593 | -0.262413 | -0.977654 | 0.676665 |
974 rows × 17 columns
다변량 시계열 분석
#기본적인 데이터 형태 파악
df.plot(figsize=(20,20))
<matplotlib.axes._subplots.AxesSubplot at 0x22bb9c10f88>
#기본적인 데이터 형태 파악
df.plot(figsize=(20,20))
<matplotlib.axes._subplots.AxesSubplot at 0x22bb9c10f88>
df
ticketing_count | ott_user_count | ott_usage_time | delivery_user_count | delivery_usage_time | used_user_count | used_usage_time | meeting_user_count | meeting_usage_time | corona_count | subway_count | KOSPI_index | KOSPI_trading | KOSDAQ_index | KOSDAQ_trading | coin_trading | coin_variance | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||||||||
2019/01/01 | 7401 | -1.301964 | -1.126337 | -1.138730 | -0.879145 | -1.332576 | -1.373703 | -1.053405 | -0.843484 | -0.607122 | -1.546825 | -0.857302 | -1.152997 | -0.801456 | -1.333287 | -0.760446 | 0.894595 |
2019/01/02 | 5069 | -1.411287 | -1.360638 | -1.542213 | -1.360824 | -1.294828 | -1.363849 | -0.816423 | -0.804730 | -0.607122 | 0.758960 | -0.857302 | -1.152997 | -0.801456 | -1.333287 | -0.473632 | 1.209898 |
2019/01/03 | 6498 | -1.512255 | -1.380048 | -1.536694 | -1.377820 | -1.179579 | -1.346215 | -0.813926 | -0.802878 | -0.607122 | 0.908731 | -0.892203 | -0.903009 | -0.886813 | -1.155502 | -0.653263 | -0.830298 |
2019/01/04 | 7088 | -1.343318 | -1.318085 | -1.434606 | -1.294248 | -1.309262 | -1.362754 | -0.819952 | -0.804077 | -0.607122 | 1.091537 | -0.856767 | -0.947702 | -0.835184 | -1.310455 | -0.547525 | 0.442506 |
2019/01/05 | 18755 | -1.010367 | -0.963209 | -1.174507 | -0.999621 | -1.319738 | -1.330514 | -1.046767 | -0.836480 | -0.607122 | -0.153104 | -0.856767 | -0.947702 | -0.835184 | -1.310455 | -0.557694 | -0.100001 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021/08/27 | 19582 | 1.656060 | 1.539082 | 2.271937 | 2.151560 | 1.174091 | 0.778091 | 1.484776 | 1.743173 | 3.594865 | 1.533985 | 1.549169 | -0.705340 | 1.646165 | -0.178085 | -1.055345 | 1.133390 |
2021/08/28 | 45456 | 1.780755 | 1.632663 | 2.442208 | 2.699594 | 1.484143 | 1.128198 | -0.075720 | 0.000089 | 3.187087 | -0.002058 | 1.549169 | -0.705340 | 1.646165 | -0.178085 | -1.100312 | -0.239105 |
2021/08/29 | 31871 | 1.692069 | 2.029458 | 2.325626 | 2.782625 | 1.308575 | 1.197393 | 0.060223 | 0.137999 | 2.877739 | -0.921063 | 1.549169 | -0.705340 | 1.646165 | -0.178085 | -1.079082 | -0.199692 |
2021/08/30 | 3652 | 1.312763 | 1.266002 | 1.502044 | 1.133586 | 1.316112 | 0.847673 | 1.702709 | 1.957982 | 2.608230 | 1.558606 | 1.571202 | -0.509248 | 1.703738 | -0.361816 | -1.041119 | -0.505722 |
2021/08/31 | 8582 | 1.434144 | 1.282830 | 1.685858 | 1.606643 | 1.279019 | 0.879251 | 1.712669 | 1.997506 | 4.138569 | 1.394636 | 1.689138 | -0.386843 | 1.748593 | -0.262413 | -0.977654 | 0.676665 |
974 rows × 17 columns
X = df.iloc[:,1:]
y = df.iloc[:,0]
y
date
2019/01/01 7401
2019/01/02 5069
2019/01/03 6498
2019/01/04 7088
2019/01/05 18755
...
2021/08/27 19582
2021/08/28 45456
2021/08/29 31871
2021/08/30 3652
2021/08/31 8582
Name: ticketing_count, Length: 974, dtype: int64
변수 N개로 PCA 수행(최적값을 찾는 과정)
#n_components 수 변경하면서 시도
pca = PCA(n_components=2)
printcipalComponents = pca.fit_transform(X)
principalDf = pd.DataFrame(data=printcipalComponents, columns = ['p1',
'p2'])
principalDf.head()
p1 | p2 | |
---|---|---|
0 | -3.472938 | 0.264457 |
1 | -4.064765 | -1.172083 |
2 | -4.014694 | -1.128189 |
3 | -3.982567 | -1.288346 |
4 | -3.529418 | -0.322702 |
#설명력 확인
pca.explained_variance_ratio_
array([0.63059992, 0.10080491])
sum(pca.explained_variance_ratio_)
0.7314048320656545
principalDf.index = df_date
principalDf
p1 | p2 | |
---|---|---|
date | ||
2019/01/01 | -3.472938 | 0.264457 |
2019/01/02 | -4.064765 | -1.172083 |
2019/01/03 | -4.014694 | -1.128189 |
2019/01/04 | -3.982567 | -1.288346 |
2019/01/05 | -3.529418 | -0.322702 |
... | ... | ... |
2021/08/27 | 5.117077 | -3.194164 |
2021/08/28 | 4.866254 | -1.415963 |
2021/08/29 | 5.011647 | -0.716045 |
2021/08/30 | 4.346513 | -3.065882 |
2021/08/31 | 5.077023 | -3.389693 |
974 rows × 2 columns
변수 2개로 PCA 수행
pca = PCA(n_components=2)
printcipalComponents = pca.fit_transform(X)
principalDf = pd.DataFrame(data=printcipalComponents, columns = ['p1','p2'])
print(principalDf.head())
print(pca.explained_variance_ratio_)
print(sum(pca.explained_variance_ratio_))
p1 p2
0 -3.472938 0.264457
1 -4.064765 -1.172083
2 -4.014694 -1.128189
3 -3.982567 -1.288346
4 -3.529418 -0.322702
[0.63059992 0.10080491]
0.7314048320656547
principalDf.index = df_date
principalDf
p1 | p2 | |
---|---|---|
date | ||
2019/01/01 | -3.472938 | 0.264457 |
2019/01/02 | -4.064765 | -1.172083 |
2019/01/03 | -4.014694 | -1.128189 |
2019/01/04 | -3.982567 | -1.288346 |
2019/01/05 | -3.529418 | -0.322702 |
... | ... | ... |
2021/08/27 | 5.117077 | -3.194164 |
2021/08/28 | 4.866254 | -1.415963 |
2021/08/29 | 5.011647 | -0.716045 |
2021/08/30 | 4.346513 | -3.065882 |
2021/08/31 | 5.077023 | -3.389693 |
974 rows × 2 columns
df=principalDf
변수1개로 PCA수행
# pca = PCA(n_components=1)
# printcipalComponents = pca.fit_transform(X)
# principalDf = pd.DataFrame(data=printcipalComponents, columns = ['p1'])
# print(principalDf.head())
# print(pca.explained_variance_ratio_)
# print(sum(pca.explained_variance_ratio_))
p1
0 -4.204660
1 -4.982673
2 -4.683025
3 -4.707338
4 -4.228135
[0.69251059]
0.6925105934278076
# principalDf.index = df_date
# principalDf
p1 | |
---|---|
date | |
2019/01/01 | -4.204660 |
2019/01/02 | -4.982673 |
2019/01/03 | -4.683025 |
2019/01/04 | -4.707338 |
2019/01/05 | -4.228135 |
... | ... |
2021/06/26 | 4.773862 |
2021/06/27 | 5.004279 |
2021/06/28 | 4.413136 |
2021/06/29 | 4.775690 |
2021/06/30 | 4.673145 |
912 rows × 1 columns
#주성분분석된 데이터와 타겟데이터 병합
principalDf.index = df_date
df = pd.merge(y, principalDf,left_index=True, right_index=True,how='inner')
df
ticketing_count | p1 | p2 | |
---|---|---|---|
date | |||
2019/01/01 | 7401 | -3.472938 | 0.264457 |
2019/01/02 | 5069 | -4.064765 | -1.172083 |
2019/01/03 | 6498 | -4.014694 | -1.128189 |
2019/01/04 | 7088 | -3.982567 | -1.288346 |
2019/01/05 | 18755 | -3.529418 | -0.322702 |
... | ... | ... | ... |
2021/08/27 | 19582 | 5.117077 | -3.194164 |
2021/08/28 | 45456 | 4.866254 | -1.415963 |
2021/08/29 | 31871 | 5.011647 | -0.716045 |
2021/08/30 | 3652 | 4.346513 | -3.065882 |
2021/08/31 | 8582 | 5.077023 | -3.389693 |
974 rows × 3 columns
y.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x22bb9c1fa48>
df.plot(figsize=(20,20))
<matplotlib.axes._subplots.AxesSubplot at 0x22bbbabc788>
정상성확인
#컬럼별로 정상성확인하는 함수
for i in df.columns:
adfuller_test = adfuller(df[i],autolag='AIC')
print(i)
print("ADF test statistic: {}".format(adfuller_test[0]))
print("p-value: {}".format(adfuller_test[1]))
ticketing_count
ADF test statistic: -2.099553733500803
p-value: 0.24469408536639126
p1
ADF test statistic: -0.2126700871560426
p-value: 0.9369944290579174
p2
ADF test statistic: -0.9660338252729967
p-value: 0.76545027597895
#차분 구하기
df_diff = df.diff().dropna()
#차분 후 정상성 재확인
for i in df.columns:
adfuller_test = adfuller(df_diff[i],autolag='AIC')
print(i)
print("ADF test statistic: {}".format(adfuller_test[0]))
print("p-value: {}".format(adfuller_test[1]))
ticketing_count
ADF test statistic: -8.90366524755091
p-value: 1.1532103667357817e-14
p1
ADF test statistic: -7.316333641274674
p-value: 1.2265309592955346e-10
p2
ADF test statistic: -8.654863163034381
p-value: 5.0006129989254966e-14
#예매 건수, p1, p2 플롯 그리기
df_diff.plot(figsize=(20,20))
<matplotlib.axes._subplots.AxesSubplot at 0x22bb5836f48>
#컬럼별로 정상성확인하는 함수
for i in df.columns:
adfuller_test = adfuller(df[i],autolag='AIC')
print(i)
print("ADF test statistic: {}".format(adfuller_test[0]))
print("p-value: {}".format(adfuller_test[1]))
ticketing_count
ADF test statistic: -2.099553733500803
p-value: 0.24469408536639126
p1
ADF test statistic: -0.2126700871560426
p-value: 0.9369944290579174
p2
ADF test statistic: -0.9660338252729967
p-value: 0.76545027597895
#차분 구하기
df_diff = df.diff().dropna()
#차분 후 정상성 재확인
for i in df.columns:
adfuller_test = adfuller(df_diff[i],autolag='AIC')
print(i)
print("ADF test statistic: {}".format(adfuller_test[0]))
print("p-value: {}".format(adfuller_test[1]))
ticketing_count
ADF test statistic: -8.90366524755091
p-value: 1.1532103667357817e-14
p1
ADF test statistic: -7.316333641274674
p-value: 1.2265309592955346e-10
p2
ADF test statistic: -8.654863163034381
p-value: 5.0006129989254966e-14
#예매 건수, p1, p2 플롯 그리기
df_diff.plot(figsize=(20,20))
<matplotlib.axes._subplots.AxesSubplot at 0x22bb5836f48>
df_diff
ticketing_count | p1 | p2 | |
---|---|---|---|
date | |||
2019/01/02 | -2332.0 | -0.591827 | -1.436540 |
2019/01/03 | 1429.0 | 0.050071 | 0.043894 |
2019/01/04 | 590.0 | 0.032127 | -0.160157 |
2019/01/05 | 11667.0 | 0.453149 | 0.965645 |
2019/01/06 | -5564.0 | 0.215044 | 0.535168 |
... | ... | ... | ... |
2021/08/27 | 3970.0 | 0.335959 | 0.163582 |
2021/08/28 | 25874.0 | -0.250823 | 1.778201 |
2021/08/29 | -13585.0 | 0.145392 | 0.699918 |
2021/08/30 | -28219.0 | -0.665134 | -2.349837 |
2021/08/31 | 4930.0 | 0.730510 | -0.323811 |
973 rows × 3 columns
#최근30일 간의 데이터 예측 및 테스트를 위해 분리
train = df_diff.iloc[:-30,:]
test = df_diff.iloc[-30:,:]
train, test
( p1 p2
date
2019/01/02 -0.591827 -1.436540
2019/01/03 0.050071 0.043894
2019/01/04 0.032127 -0.160157
2019/01/05 0.453149 0.965645
2019/01/06 0.215044 0.535168
... ... ...
2021/07/28 0.271976 -0.397341
2021/07/29 0.121974 0.071939
2021/07/30 -0.323097 -0.009642
2021/07/31 0.806326 1.693124
2021/08/01 -0.083921 0.568040
[943 rows x 2 columns],
p1 p2
date
2021/08/02 -1.197897 -1.823121
2021/08/03 0.382762 -0.307690
2021/08/04 0.224371 -0.159779
2021/08/05 0.213700 0.045172
2021/08/06 0.210503 0.053557
2021/08/07 0.433313 1.324466
2021/08/08 0.227528 0.915187
2021/08/09 -1.476812 -2.419552
2021/08/10 0.864029 -0.313854
2021/08/11 -0.361891 0.011769
2021/08/12 -0.178059 -0.032996
2021/08/13 0.349272 0.207221
2021/08/14 0.313612 1.626783
2021/08/15 -0.030712 0.621387
2021/08/16 -0.492023 -0.229886
2021/08/17 -0.320392 -2.102970
2021/08/18 0.231022 -0.418104
2021/08/19 0.186924 0.148053
2021/08/20 0.140931 0.255090
2021/08/21 0.576043 2.264528
2021/08/22 -0.445505 0.305457
2021/08/23 -0.563361 -2.491265
2021/08/24 0.635171 -0.388304
2021/08/25 -0.326547 -0.162770
2021/08/26 0.101249 0.099862
2021/08/27 0.335959 0.163582
2021/08/28 -0.250823 1.778201
2021/08/29 0.145392 0.699918
2021/08/30 -0.665134 -2.349837
2021/08/31 0.730510 -0.323811)
#VAR모델 선언 및 최적값을 찾기위해 AIC 확인
forecasting_model = VAR(train)
results_aic = []
for p in range(1,30):
results = forecasting_model.fit(p)
results_aic.append(results.aic)
C:\Users\USER\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:162: ValueWarning: No frequency information was provided, so inferred frequency D will be used.
% freq, ValueWarning)
sns.set()
plt.plot(list(np.arange(1,30,1)), results_aic)
plt.xlabel("Order")
plt.ylabel("AIC")
plt.show()
results_aic
[15.874141457425885,
15.556327191300296,
15.426246152772501,
15.31875092655693,
14.48301589548955,
13.814739595729133,
13.673927482850303,
13.665937797275639,
13.663878430769731,
13.67665193875356,
13.692038993850531,
13.67756467694419,
13.524341978518088,
13.490367613323487,
13.499927177492392,
13.512585793645396,
13.529358364796668,
13.542355917289484,
13.53924779983039,
13.496007803988674,
13.470538806278913,
13.488222722280227,
13.49994838355099,
13.511730950112858,
13.521836125116637,
13.538180084745054,
13.52127881831538,
13.519623176155903,
13.531074508641476]
#최적의 AIC값을 나타내는 순서 인덱스 추출
np.argsort(results_aic)[0]
20
#모델 피팅
results = forecasting_model.fit(np.argsort(results_aic)[0])
results.summary()
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Thu, 09, Sep, 2021
Time: 03:49:21
--------------------------------------------------------------------
No. of Equations: 3.00000 BIC: 14.4532
Nobs: 923.000 HQIC: 13.8612
Log likelihood: -9974.45 FPE: 726931.
AIC: 13.4960 Det(Omega_mle): 599947.
--------------------------------------------------------------------
Results for equation ticketing_count
======================================================================================
coefficient std. error t-stat prob
--------------------------------------------------------------------------------------
const 145.510494 220.968015 0.659 0.510
L1.ticketing_count -0.512800 0.034341 -14.933 0.000
L1.p1 -1765.455677 822.840326 -2.146 0.032
L1.p2 -2821.848574 496.649003 -5.682 0.000
L2.ticketing_count -0.594128 0.038482 -15.439 0.000
L2.p1 -1293.122466 849.927486 -1.521 0.128
L2.p2 -608.933479 528.806515 -1.152 0.250
L3.ticketing_count -0.398470 0.043336 -9.195 0.000
L3.p1 -1339.687342 866.271487 -1.546 0.122
L3.p2 -1050.299489 561.929041 -1.869 0.062
L4.ticketing_count -0.261282 0.045265 -5.772 0.000
L4.p1 -377.335131 885.985159 -0.426 0.670
L4.p2 -1217.585151 583.640112 -2.086 0.037
L5.ticketing_count -0.338885 0.046258 -7.326 0.000
L5.p1 -1258.123212 891.089378 -1.412 0.158
L5.p2 -396.262710 606.095860 -0.654 0.513
L6.ticketing_count -0.104637 0.047671 -2.195 0.028
L6.p1 -713.921745 904.037539 -0.790 0.430
L6.p2 -946.466873 623.165218 -1.519 0.129
L7.ticketing_count 0.162614 0.047820 3.401 0.001
L7.p1 -443.007072 907.350566 -0.488 0.625
L7.p2 -931.608368 634.962091 -1.467 0.142
L8.ticketing_count 0.019806 0.047552 0.417 0.677
L8.p1 -421.580373 916.131695 -0.460 0.645
L8.p2 348.177749 632.506959 0.550 0.582
L9.ticketing_count -0.042382 0.047322 -0.896 0.370
L9.p1 -530.404782 916.362104 -0.579 0.563
L9.p2 -340.607229 630.058520 -0.541 0.589
L10.ticketing_count -0.102133 0.047123 -2.167 0.030
L10.p1 -307.854688 916.870516 -0.336 0.737
L10.p2 -395.121147 631.500453 -0.626 0.532
L11.ticketing_count -0.173351 0.047131 -3.678 0.000
L11.p1 -528.739883 916.463562 -0.577 0.564
L11.p2 -80.394641 632.223254 -0.127 0.899
L12.ticketing_count -0.160090 0.047484 -3.371 0.001
L12.p1 285.141307 914.960571 0.312 0.755
L12.p2 -915.651657 630.201786 -1.453 0.146
L13.ticketing_count -0.244466 0.047791 -5.115 0.000
L13.p1 -120.630017 915.508472 -0.132 0.895
L13.p2 -94.051125 633.295991 -0.149 0.882
L14.ticketing_count 0.058981 0.048166 1.225 0.221
L14.p1 -525.286961 911.156244 -0.577 0.564
L14.p2 -203.047011 634.849183 -0.320 0.749
L15.ticketing_count -0.028254 0.048063 -0.588 0.557
L15.p1 -1117.715539 909.297358 -1.229 0.219
L15.p2 381.515824 621.906509 0.613 0.540
L16.ticketing_count -0.021476 0.046575 -0.461 0.645
L16.p1 464.248680 902.623296 0.514 0.607
L16.p2 -826.054787 603.737121 -1.368 0.171
L17.ticketing_count -0.068757 0.045738 -1.503 0.133
L17.p1 -1841.725987 890.483300 -2.068 0.039
L17.p2 -324.034853 588.439473 -0.551 0.582
L18.ticketing_count -0.133988 0.043636 -3.071 0.002
L18.p1 -346.740276 873.988699 -0.397 0.692
L18.p2 -567.606480 566.136161 -1.003 0.316
L19.ticketing_count -0.129596 0.038708 -3.348 0.001
L19.p1 -1022.793529 860.793492 -1.188 0.235
L19.p2 -1064.689090 528.789370 -2.013 0.044
L20.ticketing_count -0.155668 0.033569 -4.637 0.000
L20.p1 -566.108373 830.916946 -0.681 0.496
L20.p2 -518.094139 508.043627 -1.020 0.308
======================================================================================
Results for equation p1
======================================================================================
coefficient std. error t-stat prob
--------------------------------------------------------------------------------------
const 0.020579 0.009614 2.140 0.032
L1.ticketing_count 0.000000 0.000001 0.020 0.984
L1.p1 -0.294970 0.035801 -8.239 0.000
L1.p2 -0.121053 0.021609 -5.602 0.000
L2.ticketing_count -0.000005 0.000002 -2.874 0.004
L2.p1 -0.210368 0.036980 -5.689 0.000
L2.p2 -0.031579 0.023008 -1.372 0.170
L3.ticketing_count -0.000003 0.000002 -1.787 0.074
L3.p1 -0.234719 0.037691 -6.227 0.000
L3.p2 -0.052615 0.024449 -2.152 0.031
L4.ticketing_count -0.000005 0.000002 -2.658 0.008
L4.p1 -0.131175 0.038549 -3.403 0.001
L4.p2 -0.072328 0.025394 -2.848 0.004
L5.ticketing_count -0.000003 0.000002 -1.353 0.176
L5.p1 -0.204783 0.038771 -5.282 0.000
L5.p2 -0.023572 0.026371 -0.894 0.371
L6.ticketing_count -0.000004 0.000002 -2.155 0.031
L6.p1 -0.103657 0.039334 -2.635 0.008
L6.p2 -0.052575 0.027114 -1.939 0.052
L7.ticketing_count -0.000003 0.000002 -1.259 0.208
L7.p1 0.158566 0.039478 4.017 0.000
L7.p2 0.004891 0.027627 0.177 0.859
L8.ticketing_count -0.000002 0.000002 -1.202 0.230
L8.p1 0.003931 0.039860 0.099 0.921
L8.p2 -0.003271 0.027520 -0.119 0.905
L9.ticketing_count -0.000004 0.000002 -1.779 0.075
L9.p1 -0.049405 0.039870 -1.239 0.215
L9.p2 -0.019076 0.027414 -0.696 0.487
L10.ticketing_count -0.000000 0.000002 -0.081 0.935
L10.p1 0.020715 0.039893 0.519 0.604
L10.p2 -0.047023 0.027476 -1.711 0.087
L11.ticketing_count -0.000001 0.000002 -0.725 0.468
L11.p1 0.029460 0.039875 0.739 0.460
L11.p2 -0.025082 0.027508 -0.912 0.362
L12.ticketing_count -0.000001 0.000002 -0.618 0.537
L12.p1 0.040515 0.039810 1.018 0.309
L12.p2 -0.042454 0.027420 -1.548 0.122
L13.ticketing_count -0.000000 0.000002 -0.217 0.828
L13.p1 0.012603 0.039833 0.316 0.752
L13.p2 -0.047503 0.027554 -1.724 0.085
L14.ticketing_count -0.000001 0.000002 -0.319 0.749
L14.p1 0.109450 0.039644 2.761 0.006
L14.p2 0.017832 0.027622 0.646 0.519
L15.ticketing_count 0.000002 0.000002 0.819 0.413
L15.p1 -0.036218 0.039563 -0.915 0.360
L15.p2 -0.008406 0.027059 -0.311 0.756
L16.ticketing_count 0.000002 0.000002 0.942 0.346
L16.p1 -0.037590 0.039273 -0.957 0.338
L16.p2 -0.045645 0.026268 -1.738 0.082
L17.ticketing_count 0.000000 0.000002 0.243 0.808
L17.p1 -0.082901 0.038745 -2.140 0.032
L17.p2 -0.009364 0.025603 -0.366 0.715
L18.ticketing_count -0.000000 0.000002 -0.101 0.920
L18.p1 -0.110970 0.038027 -2.918 0.004
L18.p2 -0.031279 0.024632 -1.270 0.204
L19.ticketing_count -0.000001 0.000002 -0.520 0.603
L19.p1 -0.129667 0.037453 -3.462 0.001
L19.p2 -0.012663 0.023007 -0.550 0.582
L20.ticketing_count -0.000002 0.000001 -1.206 0.228
L20.p1 -0.132579 0.036153 -3.667 0.000
L20.p2 -0.018261 0.022105 -0.826 0.409
======================================================================================
Results for equation p2
======================================================================================
coefficient std. error t-stat prob
--------------------------------------------------------------------------------------
const -0.000051 0.016017 -0.003 0.997
L1.ticketing_count 0.000002 0.000002 0.798 0.425
L1.p1 -0.170507 0.059646 -2.859 0.004
L1.p2 -0.391149 0.036001 -10.865 0.000
L2.ticketing_count -0.000003 0.000003 -1.250 0.211
L2.p1 0.061796 0.061609 1.003 0.316
L2.p2 -0.424547 0.038332 -11.076 0.000
L3.ticketing_count -0.000003 0.000003 -0.801 0.423
L3.p1 -0.049023 0.062794 -0.781 0.435
L3.p2 -0.361065 0.040733 -8.864 0.000
L4.ticketing_count -0.000001 0.000003 -0.257 0.797
L4.p1 0.012950 0.064223 0.202 0.840
L4.p2 -0.375514 0.042307 -8.876 0.000
L5.ticketing_count -0.000001 0.000003 -0.282 0.778
L5.p1 -0.111906 0.064593 -1.732 0.083
L5.p2 -0.304719 0.043934 -6.936 0.000
L6.ticketing_count -0.000001 0.000003 -0.301 0.763
L6.p1 -0.054229 0.065531 -0.828 0.408
L6.p2 -0.264942 0.045172 -5.865 0.000
L7.ticketing_count 0.000000 0.000003 0.061 0.952
L7.p1 0.146371 0.065772 2.225 0.026
L7.p2 0.002069 0.046027 0.045 0.964
L8.ticketing_count -0.000000 0.000003 -0.022 0.983
L8.p1 -0.032427 0.066408 -0.488 0.625
L8.p2 -0.075254 0.045849 -1.641 0.101
L9.ticketing_count 0.000002 0.000003 0.528 0.597
L9.p1 -0.006389 0.066425 -0.096 0.923
L9.p2 -0.144795 0.045671 -3.170 0.002
L10.ticketing_count 0.000002 0.000003 0.498 0.618
L10.p1 -0.020082 0.066462 -0.302 0.763
L10.p2 -0.085537 0.045776 -1.869 0.062
L11.ticketing_count -0.000001 0.000003 -0.393 0.694
L11.p1 0.103744 0.066432 1.562 0.118
L11.p2 -0.112537 0.045828 -2.456 0.014
L12.ticketing_count -0.000001 0.000003 -0.358 0.720
L12.p1 0.057965 0.066323 0.874 0.382
L12.p2 -0.162660 0.045682 -3.561 0.000
L13.ticketing_count -0.000001 0.000003 -0.376 0.707
L13.p1 -0.121140 0.066363 -1.825 0.068
L13.p2 -0.096656 0.045906 -2.106 0.035
L14.ticketing_count -0.000000 0.000003 -0.060 0.952
L14.p1 0.115956 0.066047 1.756 0.079
L14.p2 0.053468 0.046019 1.162 0.245
L15.ticketing_count -0.000001 0.000003 -0.350 0.726
L15.p1 -0.062024 0.065913 -0.941 0.347
L15.p2 -0.018219 0.045080 -0.404 0.686
L16.ticketing_count 0.000000 0.000003 0.007 0.995
L16.p1 0.001574 0.065429 0.024 0.981
L16.p2 -0.152048 0.043763 -3.474 0.001
L17.ticketing_count -0.000000 0.000003 -0.064 0.949
L17.p1 -0.131831 0.064549 -2.042 0.041
L17.p2 -0.096239 0.042655 -2.256 0.024
L18.ticketing_count -0.000001 0.000003 -0.302 0.763
L18.p1 -0.076352 0.063353 -1.205 0.228
L18.p2 -0.141783 0.041038 -3.455 0.001
L19.ticketing_count -0.000002 0.000003 -0.620 0.536
L19.p1 -0.083997 0.062397 -1.346 0.178
L19.p2 -0.153884 0.038331 -4.015 0.000
L20.ticketing_count 0.000000 0.000002 0.094 0.925
L20.p1 -0.153089 0.060231 -2.542 0.011
L20.p2 -0.098449 0.036827 -2.673 0.008
======================================================================================
Correlation matrix of residuals
ticketing_count p1 p2
ticketing_count 1.000000 0.143784 0.180547
p1 0.143784 1.000000 0.313019
p2 0.180547 0.313019 1.000000
#차분 값에 대한 다변량 시계열 분석 진행
laaged_values = train.values
forecast = pd.DataFrame(results.forecast(y= laaged_values, steps=30), index = test.index,\
columns=df.columns)
forecast
ticketing_count | p1 | p2 | |
---|---|---|---|
date | |||
2021/08/02 | -21642.828443 | -1.033055 | -1.920281 |
2021/08/03 | 4108.724583 | 0.139701 | -0.231206 |
2021/08/04 | 6880.360834 | 0.272476 | -0.069083 |
2021/08/05 | -2316.669338 | 0.105379 | -0.046469 |
2021/08/06 | 3700.776681 | 0.056802 | 0.183212 |
2021/08/07 | 15584.686434 | 0.523810 | 1.435338 |
2021/08/08 | -7120.885442 | 0.155960 | 0.662741 |
2021/08/09 | -21598.560374 | -1.056706 | -1.925830 |
2021/08/10 | 6374.681066 | 0.093848 | -0.235443 |
2021/08/11 | 4835.465686 | 0.167524 | 0.017505 |
2021/08/12 | -1061.610716 | 0.108480 | -0.038615 |
2021/08/13 | 3194.224751 | 0.057892 | 0.030914 |
2021/08/14 | 15751.719904 | 0.590367 | 1.493483 |
2021/08/15 | -8752.597219 | 0.148820 | 0.582362 |
2021/08/16 | -19415.434905 | -0.958010 | -1.771422 |
2021/08/17 | 4613.948348 | 0.059555 | -0.313527 |
2021/08/18 | 6870.373231 | 0.149024 | 0.107671 |
2021/08/19 | -2616.504109 | 0.089648 | -0.079262 |
2021/08/20 | 3316.261843 | 0.029082 | -0.012249 |
2021/08/21 | 15465.099089 | 0.534741 | 1.439276 |
2021/08/22 | -8338.881634 | 0.123938 | 0.542337 |
2021/08/23 | -19732.722397 | -0.867167 | -1.697932 |
2021/08/24 | 5293.242279 | -0.005365 | -0.337871 |
2021/08/25 | 5897.342281 | 0.144592 | 0.139256 |
2021/08/26 | -2098.080570 | 0.064798 | -0.095532 |
2021/08/27 | 3423.032635 | 0.060817 | 0.020882 |
2021/08/28 | 15996.419593 | 0.515684 | 1.370608 |
2021/08/29 | -8996.299133 | 0.147419 | 0.538153 |
2021/08/30 | -19438.048277 | -0.827387 | -1.619591 |
2021/08/31 | 5025.855072 | -0.024818 | -0.358857 |
#축적된 값을 더하여 실제 예측값 구하기(ticketing_count_forecasted가 예측값임)
for i in df.columns:
forecast[f'{i}_forecasted']= df[i].iloc[-30-1]+forecast[i].cumsum()
print(forecast)
ticketing_count p1 p2 ticketing_count_forecasted \
date
2021/08/02 -21642.828443 -1.033055 -1.920281 6203.171557
2021/08/03 4108.724583 0.139701 -0.231206 10311.896141
2021/08/04 6880.360834 0.272476 -0.069083 17192.256975
2021/08/05 -2316.669338 0.105379 -0.046469 14875.587637
2021/08/06 3700.776681 0.056802 0.183212 18576.364319
2021/08/07 15584.686434 0.523810 1.435338 34161.050753
2021/08/08 -7120.885442 0.155960 0.662741 27040.165311
2021/08/09 -21598.560374 -1.056706 -1.925830 5441.604936
2021/08/10 6374.681066 0.093848 -0.235443 11816.286002
2021/08/11 4835.465686 0.167524 0.017505 16651.751688
2021/08/12 -1061.610716 0.108480 -0.038615 15590.140972
2021/08/13 3194.224751 0.057892 0.030914 18784.365723
2021/08/14 15751.719904 0.590367 1.493483 34536.085627
2021/08/15 -8752.597219 0.148820 0.582362 25783.488407
2021/08/16 -19415.434905 -0.958010 -1.771422 6368.053503
2021/08/17 4613.948348 0.059555 -0.313527 10982.001851
2021/08/18 6870.373231 0.149024 0.107671 17852.375082
2021/08/19 -2616.504109 0.089648 -0.079262 15235.870973
2021/08/20 3316.261843 0.029082 -0.012249 18552.132817
2021/08/21 15465.099089 0.534741 1.439276 34017.231906
2021/08/22 -8338.881634 0.123938 0.542337 25678.350272
2021/08/23 -19732.722397 -0.867167 -1.697932 5945.627875
2021/08/24 5293.242279 -0.005365 -0.337871 11238.870154
2021/08/25 5897.342281 0.144592 0.139256 17136.212435
2021/08/26 -2098.080570 0.064798 -0.095532 15038.131865
2021/08/27 3423.032635 0.060817 0.020882 18461.164500
2021/08/28 15996.419593 0.515684 1.370608 34457.584093
2021/08/29 -8996.299133 0.147419 0.538153 25461.284960
2021/08/30 -19438.048277 -0.827387 -1.619591 6023.236683
2021/08/31 5025.855072 -0.024818 -0.358857 11049.091755
p1_forecasted p2_forecasted
date
2021/08/02 4.050834 -2.306268
2021/08/03 4.190535 -2.537474
2021/08/04 4.463011 -2.606557
2021/08/05 4.568389 -2.653026
2021/08/06 4.625191 -2.469813
2021/08/07 5.149000 -1.034475
2021/08/08 5.304960 -0.371734
2021/08/09 4.248254 -2.297564
2021/08/10 4.342102 -2.533007
2021/08/11 4.509626 -2.515502
2021/08/12 4.618106 -2.554117
2021/08/13 4.675998 -2.523203
2021/08/14 5.266365 -1.029720
2021/08/15 5.415185 -0.447358
2021/08/16 4.457175 -2.218780
2021/08/17 4.516731 -2.532307
2021/08/18 4.665754 -2.424636
2021/08/19 4.755402 -2.503899
2021/08/20 4.784485 -2.516147
2021/08/21 5.319226 -1.076871
2021/08/22 5.443164 -0.534534
2021/08/23 4.575996 -2.232466
2021/08/24 4.570632 -2.570337
2021/08/25 4.715224 -2.431080
2021/08/26 4.780021 -2.526612
2021/08/27 4.840838 -2.505730
2021/08/28 5.356523 -1.135123
2021/08/29 5.503941 -0.596969
2021/08/30 4.676555 -2.216561
2021/08/31 4.651737 -2.575418
df
ticketing_count | p1 | p2 | |
---|---|---|---|
date | |||
2019/01/01 | 7401 | -3.472938 | 0.264457 |
2019/01/02 | 5069 | -4.064765 | -1.172083 |
2019/01/03 | 6498 | -4.014694 | -1.128189 |
2019/01/04 | 7088 | -3.982567 | -1.288346 |
2019/01/05 | 18755 | -3.529418 | -0.322702 |
... | ... | ... | ... |
2021/08/27 | 19582 | 5.117077 | -3.194164 |
2021/08/28 | 45456 | 4.866254 | -1.415963 |
2021/08/29 | 31871 | 5.011647 | -0.716045 |
2021/08/30 | 3652 | 4.346513 | -3.065882 |
2021/08/31 | 8582 | 5.077023 | -3.389693 |
974 rows × 3 columns
#예측값과 실제값 확인 (2021/08/02~2021/08/31 기간 내)
test = df.iloc[-30:,:1]
for i in test.columns:
test[f'{i}_forecasted'] = forecast[f'{i}_forecasted']
test.plot(figsize=(20,20))
<matplotlib.axes._subplots.AxesSubplot at 0x22bb9b05e08>
# num01
# num02
# num04_1
# num04_2
# num04_3
# num05_1_Standard
# num05_1_MinMax
# num05_1_Robust
# num05_2_standard
# num05_2_MinMax
# num05_2_Robust
# num05_3_standard
# num05_3_MinMax
# num05_3_Robust
# num06_1_Standard
# num06_1_MinMax
# num06_1_Robust
# num06_2_standard
# num06_2_MinMax
# num06_2_Robust
# num06_3_standard
# num06_3_MinMax
# num06_3_Robust
# test = num05_2_standard
mse = mean_squared_error(test['ticketing_count'], test['ticketing_count_forecasted'])
rmse = np.sqrt(mse)
print(f'MSE: {mse}')
print(f'RMSE: {rmse}')
print('Variance score: {0:.3f}'.format(r2_score(test['ticketing_count'],
test['ticketing_count_forecasted'])))
MSE: 13358737.129938863
RMSE: 3654.960619478528
Variance score: 0.890
모델 후보 성능 테스트
# metric_list=[num01,
# num02,
# num04_1,
# num04_2,
# num04_3,
# num05_1_Standard,
# num05_1_MinMax,
# num05_1_Robust,
# num05_2_standard,
# num05_2_MinMax,
# num05_2_Robust,
# num05_3_standard,
# num05_3_MinMax,
# num05_3_Robust,
# num06_1_Standard,
# num06_1_MinMax,
# num06_1_Robust,
# num06_2_standard,
# num06_2_MinMax,
# num06_2_Robust,
# num06_3_standard,
# num06_3_MinMax,
# num06_3_Robust]
# mse_list=[]
# rmse_list = []
# r2_score_list=[]
# for index,i in enumerate(metric_list):
# print(index+1,'번')
# if i['ticketing_count'][0]>=10:
# test = i
# print(test)
# mse = mean_squared_error(test['ticketing_count'],
# test['ticketing_count_forecasted'])
# rmse = np.sqrt(mse)
# r2score = r2_score(test['ticketing_count'],test['ticketing_count_forecasted'])
# print(f'MSE: {mse}')
# print(f'RMSE: {rmse}')
# print('Variance score: {0:.3f}'.format(r2score))
# elif i['ticketing_count'][0]>1:
# test = np.expm1(i)
# print("inverse_log_scaled")
# print(test)
# mse = mean_squared_error(test['ticketing_count'],
# test['ticketing_count_forecasted'])
# rmse = np.sqrt(mse)
# r2score = r2_score(test['ticketing_count'],test['ticketing_count_forecasted'])
# print(f'MSE: {mse}')
# print(f'RMSE: {rmse}')
# print('Variance score: {0:.3f}'.format(r2score))
# mse_list.append(mse)
# rmse_list.append(rmse)
# r2_score_list.append(r2score)
# # for i in metric_list:
# # print(i)
1 번
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020 18464.274083
2021/06/17 17573 16945.966268
2021/06/18 21608 19220.695620
2021/06/19 51055 38581.320241
2021/06/20 35137 23195.149423
2021/06/21 3331 2031.774542
2021/06/22 13000 10046.775567
2021/06/23 17698 20392.380196
2021/06/24 18357 17531.651535
2021/06/25 21268 21064.888283
2021/06/26 51912 34809.554088
2021/06/27 37135 21983.634169
2021/06/28 3911 2672.880970
2021/06/29 10714 11837.351398
2021/06/30 19878 20764.617707
MSE: 57345736.70610123
RMSE: 7572.696792167321
Variance score: 0.725
2 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 13017.112639
2021/06/17 17573.0 11004.720728
2021/06/18 21608.0 16379.812459
2021/06/19 51055.0 27871.782599
2021/06/20 35137.0 20758.582156
2021/06/21 3331.0 2308.044145
2021/06/22 13000.0 9587.316723
2021/06/23 17698.0 12807.508619
2021/06/24 18357.0 10798.466674
2021/06/25 21268.0 15176.342051
2021/06/26 51912.0 25149.102386
2021/06/27 37135.0 20274.764900
2021/06/28 3911.0 2545.187791
2021/06/29 10714.0 9960.375378
2021/06/30 19878.0 12697.393160
MSE: 133603497.49971189
RMSE: 11558.697915410363
Variance score: 0.359
3 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 13292.238242
2021/06/17 17573.0 12494.135916
2021/06/18 21608.0 15631.354020
2021/06/19 51055.0 32684.420908
2021/06/20 35137.0 25629.448875
2021/06/21 3331.0 2554.030814
2021/06/22 13000.0 10086.269064
2021/06/23 17698.0 12932.492305
2021/06/24 18357.0 12731.343143
2021/06/25 21268.0 16621.914562
2021/06/26 51912.0 33444.484249
2021/06/27 37135.0 25577.851681
2021/06/28 3911.0 2532.232030
2021/06/29 10714.0 10279.283991
2021/06/30 19878.0 13411.301420
MSE: 73062313.6941993
RMSE: 8547.649600574376
Variance score: 0.649
4 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 16563.855678
2021/06/17 17573.0 11378.623517
2021/06/18 21608.0 18548.865687
2021/06/19 51055.0 35064.824591
2021/06/20 35137.0 23637.875640
2021/06/21 3331.0 2749.690738
2021/06/22 13000.0 10099.466368
2021/06/23 17698.0 16242.414752
2021/06/24 18357.0 12169.400684
2021/06/25 21268.0 19020.100038
2021/06/26 51912.0 36203.586826
2021/06/27 37135.0 24920.772487
2021/06/28 3911.0 2737.480839
2021/06/29 10714.0 9916.291704
2021/06/30 19878.0 16811.251546
MSE: 59973125.16755498
RMSE: 7744.231735140354
Variance score: 0.712
5 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 16213.055215
2021/06/17 17573.0 10423.908786
2021/06/18 21608.0 19011.957765
2021/06/19 51055.0 35218.478770
2021/06/20 35137.0 21990.772804
2021/06/21 3331.0 2677.276444
2021/06/22 13000.0 9614.251002
2021/06/23 17698.0 14525.099908
2021/06/24 18357.0 11303.266231
2021/06/25 21268.0 18049.368090
2021/06/26 51912.0 35067.767963
2021/06/27 37135.0 23876.203606
2021/06/28 3911.0 2581.887557
2021/06/29 10714.0 9316.814739
2021/06/30 19878.0 14699.764609
MSE: 70334689.04451495
RMSE: 8386.57791023937
Variance score: 0.663
6 번
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020 15224.609205
2021/06/17 17573 13806.418769
2021/06/18 21608 18131.488195
2021/06/19 51055 38375.911545
2021/06/20 35137 26591.700747
2021/06/21 3331 4002.860504
2021/06/22 13000 10122.489030
2021/06/23 17698 15770.068258
2021/06/24 18357 13296.757206
2021/06/25 21268 18430.915817
2021/06/26 51912 37757.105260
2021/06/27 37135 26699.958404
2021/06/28 3911 4511.393517
2021/06/29 10714 10947.512550
2021/06/30 19878 16094.247197
MSE: 42012198.087339
RMSE: 6481.68173295627
Variance score: 0.798
7 번
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020 14840.682078
2021/06/17 17573 14073.468938
2021/06/18 21608 18016.853737
2021/06/19 51055 37959.047571
2021/06/20 35137 26634.563312
2021/06/21 3331 4412.203000
2021/06/22 13000 10304.743301
2021/06/23 17698 15782.065770
2021/06/24 18357 13707.057674
2021/06/25 21268 18415.540609
2021/06/26 51912 37302.548955
2021/06/27 37135 26795.748701
2021/06/28 3911 5104.656012
2021/06/29 10714 11349.829869
2021/06/30 19878 16120.161754
MSE: 43141328.29019242
RMSE: 6568.2058653937165
Variance score: 0.793
8 번
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020 15145.473807
2021/06/17 17573 13412.517589
2021/06/18 21608 17894.891688
2021/06/19 51055 38254.831850
2021/06/20 35137 27139.436164
2021/06/21 3331 3729.568660
2021/06/22 13000 9717.499092
2021/06/23 17698 15212.989246
2021/06/24 18357 12855.183032
2021/06/25 21268 18595.247588
2021/06/26 51912 37632.568025
2021/06/27 37135 27502.800811
2021/06/28 3911 4412.760699
2021/06/29 10714 10391.157028
2021/06/30 19878 15575.899080
MSE: 41932393.891585976
RMSE: 6475.522673235419
Variance score: 0.799
9 번
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020 14070.808490
2021/06/17 17573 13047.439457
2021/06/18 21608 18018.433064
2021/06/19 51055 38981.207813
2021/06/20 35137 27247.462467
2021/06/21 3331 3536.860409
2021/06/22 13000 8927.900897
2021/06/23 17698 14604.147833
2021/06/24 18357 12448.377153
2021/06/25 21268 18623.693374
2021/06/26 51912 39121.181984
2021/06/27 37135 27431.680640
2021/06/28 3911 3848.546292
2021/06/29 10714 9721.664107
2021/06/30 19878 14756.151006
MSE: 39691319.08865397
RMSE: 6300.104688705892
Variance score: 0.810
10 번
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020 14132.662477
2021/06/17 17573 13291.384048
2021/06/18 21608 17882.641611
2021/06/19 51055 38602.406088
2021/06/20 35137 27129.027589
2021/06/21 3331 3961.158028
2021/06/22 13000 9532.207351
2021/06/23 17698 14858.455654
2021/06/24 18357 12678.268413
2021/06/25 21268 18243.706646
2021/06/26 51912 38427.110563
2021/06/27 37135 27436.956769
2021/06/28 3911 4505.308736
2021/06/29 10714 10335.661768
2021/06/30 19878 14990.670449
MSE: 40956618.24725466
RMSE: 6399.735795113315
Variance score: 0.803
11 번
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020 15156.937449
2021/06/17 17573 13129.151004
2021/06/18 21608 18524.487245
2021/06/19 51055 37951.810664
2021/06/20 35137 26012.328397
2021/06/21 3331 4006.885533
2021/06/22 13000 9566.416674
2021/06/23 17698 14122.646466
2021/06/24 18357 12217.204477
2021/06/25 21268 18745.004728
2021/06/26 51912 37374.470372
2021/06/27 37135 26362.823361
2021/06/28 3911 4892.938809
2021/06/29 10714 10321.786332
2021/06/30 19878 14557.458165
MSE: 47341708.009495184
RMSE: 6880.531084843319
Variance score: 0.773
12 번
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020 13837.414435
2021/06/17 17573 13097.047142
2021/06/18 21608 18108.588813
2021/06/19 51055 38847.197236
2021/06/20 35137 27252.862353
2021/06/21 3331 3356.940128
2021/06/22 13000 8924.207741
2021/06/23 17698 14430.378944
2021/06/24 18357 12215.724432
2021/06/25 21268 18636.967015
2021/06/26 51912 38701.309126
2021/06/27 37135 27275.729405
2021/06/28 3911 3712.518305
2021/06/29 10714 9690.498727
2021/06/30 19878 14386.090379
MSE: 41318127.34276407
RMSE: 6427.917807716903
Variance score: 0.802
13 번
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020 14341.971583
2021/06/17 17573 13730.493228
2021/06/18 21608 18228.750853
2021/06/19 51055 38647.857441
2021/06/20 35137 27125.907914
2021/06/21 3331 4048.029937
2021/06/22 13000 9616.272501
2021/06/23 17698 15306.952529
2021/06/24 18357 13239.510207
2021/06/25 21268 18887.276451
2021/06/26 51912 38058.852545
2021/06/27 37135 27312.959319
2021/06/28 3911 4659.627158
2021/06/29 10714 10404.226059
2021/06/30 19878 15207.876143
MSE: 40342981.57154153
RMSE: 6351.612517427487
Variance score: 0.806
14 번
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020 15113.939525
2021/06/17 17573 12923.164633
2021/06/18 21608 18441.080032
2021/06/19 51055 37866.669512
2021/06/20 35137 26188.709835
2021/06/21 3331 3891.896020
2021/06/22 13000 9579.287297
2021/06/23 17698 14369.289490
2021/06/24 18357 11945.508132
2021/06/25 21268 18440.935467
2021/06/26 51912 37610.225183
2021/06/27 37135 26282.371602
2021/06/28 3911 4773.395837
2021/06/29 10714 10345.017250
2021/06/30 19878 14861.752131
MSE: 47081451.49804732
RMSE: 6861.592489943375
Variance score: 0.774
15 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 13292.238242
2021/06/17 17573.0 12494.135916
2021/06/18 21608.0 15631.354020
2021/06/19 51055.0 32684.420908
2021/06/20 35137.0 25629.448875
2021/06/21 3331.0 2554.030814
2021/06/22 13000.0 10086.269064
2021/06/23 17698.0 12932.492305
2021/06/24 18357.0 12731.343143
2021/06/25 21268.0 16621.914562
2021/06/26 51912.0 33444.484249
2021/06/27 37135.0 25577.851681
2021/06/28 3911.0 2532.232030
2021/06/29 10714.0 10279.283991
2021/06/30 19878.0 13411.301420
MSE: 73062313.69419986
RMSE: 8547.64960057441
Variance score: 0.649
16 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 13923.371405
2021/06/17 17573.0 12426.725437
2021/06/18 21608.0 16660.553159
2021/06/19 51055.0 34706.691283
2021/06/20 35137.0 26340.550596
2021/06/21 3331.0 2711.352017
2021/06/22 13000.0 10005.255827
2021/06/23 17698.0 14088.134138
2021/06/24 18357.0 12846.068144
2021/06/25 21268.0 17364.449942
2021/06/26 51912.0 35629.915494
2021/06/27 37135.0 26312.935898
2021/06/28 3911.0 2646.077998
2021/06/29 10714.0 10247.510935
2021/06/30 19878.0 14330.568623
MSE: 58641021.94179193
RMSE: 7657.742613968684
Variance score: 0.719
17 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 13295.727861
2021/06/17 17573.0 12380.622902
2021/06/18 21608.0 14980.833398
2021/06/19 51055.0 32703.665979
2021/06/20 35137.0 25243.435275
2021/06/21 3331.0 2439.995226
2021/06/22 13000.0 10300.857872
2021/06/23 17698.0 12212.274941
2021/06/24 18357.0 12770.963761
2021/06/25 21268.0 16389.417815
2021/06/26 51912.0 32757.100473
2021/06/27 37135.0 25643.822895
2021/06/28 3911.0 2432.327586
2021/06/29 10714.0 10226.380424
2021/06/30 19878.0 13203.012640
MSE: 76508076.35466559
RMSE: 8746.889524549031
Variance score: 0.633
18 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 12611.111640
2021/06/17 17573.0 12147.767887
2021/06/18 21608.0 15367.093878
2021/06/19 51055.0 32760.078836
2021/06/20 35137.0 24020.673126
2021/06/21 3331.0 2519.830383
2021/06/22 13000.0 9411.148568
2021/06/23 17698.0 11737.864976
2021/06/24 18357.0 12703.938195
2021/06/25 21268.0 15575.557371
2021/06/26 51912.0 32438.485622
2021/06/27 37135.0 24442.027805
2021/06/28 3911.0 2415.786131
2021/06/29 10714.0 9735.612207
2021/06/30 19878.0 12293.344823
MSE: 83128862.47926863
RMSE: 9117.50308358975
Variance score: 0.601
19 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 14022.225294
2021/06/17 17573.0 12239.907977
2021/06/18 21608.0 16660.636038
2021/06/19 51055.0 33922.605460
2021/06/20 35137.0 25846.782896
2021/06/21 3331.0 2635.220186
2021/06/22 13000.0 9890.133296
2021/06/23 17698.0 14195.277179
2021/06/24 18357.0 12577.053179
2021/06/25 21268.0 17076.179759
2021/06/26 51912.0 34479.595336
2021/06/27 37135.0 25674.610746
2021/06/28 3911.0 2565.021312
2021/06/29 10714.0 10172.953917
2021/06/30 19878.0 14420.751112
MSE: 64950647.36962161
RMSE: 8059.196446893549
Variance score: 0.688
20 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 12883.252338
2021/06/17 17573.0 11998.962440
2021/06/18 21608.0 15389.200371
2021/06/19 51055.0 32856.953343
2021/06/20 35137.0 24667.688732
2021/06/21 3331.0 2445.459782
2021/06/22 13000.0 11050.949144
2021/06/23 17698.0 11738.149050
2021/06/24 18357.0 12400.749436
2021/06/25 21268.0 16745.671546
2021/06/26 51912.0 34496.075774
2021/06/27 37135.0 25540.563466
2021/06/28 3911.0 2420.769299
2021/06/29 10714.0 10514.491048
2021/06/30 19878.0 12397.479305
MSE: 73805318.03967038
RMSE: 8591.00215572493
Variance score: 0.646
21 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 14293.280092
2021/06/17 17573.0 11649.234506
2021/06/18 21608.0 15273.401915
2021/06/19 51055.0 36204.247431
2021/06/20 35137.0 21446.364670
2021/06/21 3331.0 2694.504691
2021/06/22 13000.0 9542.466815
2021/06/23 17698.0 12333.689540
2021/06/24 18357.0 12966.972880
2021/06/25 21268.0 15556.824946
2021/06/26 51912.0 33925.172621
2021/06/27 37135.0 23043.865029
2021/06/28 3911.0 2456.393876
2021/06/29 10714.0 9639.149278
2021/06/30 19878.0 13342.604126
MSE: 76973241.84661
RMSE: 8773.439567615998
Variance score: 0.631
22 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 13903.201980
2021/06/17 17573.0 12327.409530
2021/06/18 21608.0 16295.831595
2021/06/19 51055.0 35402.026565
2021/06/20 35137.0 23407.719156
2021/06/21 3331.0 2823.028497
2021/06/22 13000.0 9555.845179
2021/06/23 17698.0 13320.597696
2021/06/24 18357.0 13482.095165
2021/06/25 21268.0 16453.489664
2021/06/26 51912.0 34508.576301
2021/06/27 37135.0 24257.608517
2021/06/28 3911.0 2631.018477
2021/06/29 10714.0 9782.005924
2021/06/30 19878.0 13735.221561
MSE: 68449823.50867456
RMSE: 8273.440850617992
Variance score: 0.672
23 번
inverse_log_scaled
ticketing_count ticketing_count_forecasted
date
2021/06/16 15020.0 12526.087432
2021/06/17 17573.0 10726.710788
2021/06/18 21608.0 15422.778112
2021/06/19 51055.0 34202.537897
2021/06/20 35137.0 20639.571463
2021/06/21 3331.0 2464.012863
2021/06/22 13000.0 10097.839720
2021/06/23 17698.0 10615.429118
2021/06/24 18357.0 11789.251687
2021/06/25 21268.0 15658.359920
2021/06/26 51912.0 33911.297993
2021/06/27 37135.0 23017.540695
2021/06/28 3911.0 2351.660718
2021/06/29 10714.0 9820.905112
2021/06/30 19878.0 11527.253991
MSE: 87717439.74389933
RMSE: 9365.758898450213
Variance score: 0.579
dict_data = {'mse': mse_list,
'rmse': rmse_list,
'r2_score': r2_score_list}
scores_df = pd.DataFrame(dict_data)
scores_df['r2_score'] = scores_df['r2_score'].round(3)
scores_df.to_csv('F:\\drive\\WebWorkPlace2021\\jupyter\\code\\다변량시계열예측모델평가점수.csv')
결론:
로그변환(x) + PCA(컴포넌트 2개) + 스탠다드스케일링을 거친 모델(m9)이 가장 좋은 성능을 보여줌
다음 시간에는 확정 모델(m9)로 다변량 시계열 예측 모델(VAR)을 만드는 과정을 알아볼게요
Author And Source
이 문제에 관하여([공모전 수상작 리뷰] Reactjs+Nodejs+python+scikit-learn{ PCA(주성분 분석), VAR(다변량시계열분석)}으로 공연 예매 추이 시나리오 별 예측하는 서비스 만들어보기 - 데이터 분석 편(2)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@designc/공모전-수상작-리뷰-ReactjsNodejspythonscikit-learn-PCA주성분-분석-VAR다변량시계열분석으로-공연-예매-추이-시나리오-별-예측하는-서비스-만들어보기-데이터-분석-편2저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)