[Time Series ๐Ÿ“‰][Forecasting :Principles and Practice] AR, MA, ARMA, ARIMA ๊ฐœ๋… ์ •๋ฆฌ

124300 ๋‹จ์–ด Time SeriesStatisticsStatistics

Forecasting :Principles and Practice๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ •๋ฆฌํ•œ ๋ฌธ์„œ์ž…๋‹ˆ๋‹ค.

Forecasting: Principles and Practice , Rob J Hyndman and George Athanasopoulos

Table of Content

1. Stationary and Non-Stationary
2. Autoregressive(AR) Model
3. Moving Average(MA) Model
4. Autoregressive and Moving Average Model(ARMA)
5. Autoregressive Integrated Moving Average Model(ARIMA)
6. ACF(Autocorrelated Function) and PACF(Partially ACF)

1. Stationary and Non-Stationary

(1) Stationary Process(์ •์ƒ์„ฑ) : ์‹œ๊ฐ„๊ณผ ๊ด€๊ณ„์—†์ด ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์ด ์ผ์ •ํ•œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ

(2) Non-Stationary Process(๋น„์ •์ƒ์„ฑ) : ์‹œ๊ฐ„์— ๊ด€๊ณ„์—†์ด ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์ด ์ผ์ •ํ•˜์ง€ ์•Š์€ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ

์ •์ƒ์„ฑ๊ณผ ๋น„์ •์ƒ์„ฑ์„ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•
X์ถ•์„ Lag(ํ˜„์žฌ ๋ฐ์ดํ„ฐ์™€์˜ ์‹œ์  ์ฐจ์ด)๋กœ ์„ค์ •ํ•˜๊ณ , Y์ถ•์„ ACF(Autocorrelation Function)์œผ๋กœ ์‹œ๊ฐํ™”ํ•˜์˜€์„ ๋•Œ ์ฃผ๊ธฐ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚˜๋Š” ํŒจํ„ด์ด ์—†์œผ๋ฉด Stationary Process๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Autocorrelation์ด๋ž€?
Correlation์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋‘ ๋ณ€์ˆ˜ ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ -1~1์˜ ๊ฐ’์œผ๋กœ ํ‘œํ˜„ํ•˜๋Š” ์ฒ™๋„์ž…๋‹ˆ๋‹ค. -1์— ๊ฐ€๊นŒ์šธ ์ˆ˜๋ก ์Œ์˜ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€, +1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์–‘์˜ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์กด์žฌํ•œ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Autocorrelation์ด๋ž€ Correlation์— Auto ๊ฐœ๋…์ด ์ถ”๊ฐ€๋œ ๊ฒƒ์œผ๋กœ ์‹œ๊ณ„์—ด์ ์ธ ๊ด€์ ์œผ๋กœ ๋ณด์•˜์„ ๋•Œ Time shifted๋œ ์ž๊ธฐ ์ž์‹ ๊ณผ์˜ ์ƒ๊ด€ ๊ด€๊ณ„๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

2. Autoregressive(AR) Models

์ž๊ธฐ์ž์‹ ์„ ์ข…์†๋ณ€์ˆ˜(Dependent Variable) yty_t

์ฐจ์ˆ˜ pp์˜ ์ž๊ธฐํšŒ๊ท€๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

yt=c+ฮฆ1ytโˆ’1+ฮฆ2ytโˆ’2+...+ฮฆpytโˆ’p+ฯตty_t = c + \Phi_{1} y_{t-1} + \Phi_{2} y_{t-2} + ... + \Phi_{p} y_{t-p} + \epsilon_{t}

์œ„์˜ ์‹์—์„œ ฯตt\epsilon_{t}

3. Moving Average(MA) Models

์ž๊ธฐ์ž์‹ ์„ ์ข…์†๋ณ€์ˆ˜(Dependent Variable) yty_t

์ฐจ์ˆ˜ qq์˜ ์ด๋™ํ‰๊ท ๋ชจ๋ธ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

yt=c+ฯตt+ฮธ1ฯตtโˆ’1+ฮธ2ฯตtโˆ’2+...+ฮธqฯตtโˆ’qy_t = c + \epsilon_{t} + \theta_{1} \epsilon_{t-1} + \theta_{2} \epsilon_{t-2} + ... + \theta_{q} \epsilon_{t-q}

์—ฌ๊ธฐ์„œ ฯตt\epsilon_{t}

4. Autoregressive and Moving Average (ARMA)

์ž๊ธฐ์ž์‹ ์„ ์ข…์†๋ณ€์ˆ˜(Dependent Variable) yty_t

p์™€ q ์ฐจ์›์„ ๊ฐ€์ง€๋Š” ARMA Model์˜ ์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

yt=ฮธ0+ฮธ1ytโˆ’1+ฮธ2ytโˆ’2+...+ฮธpytโˆ’p+ฯตt+ฮธ1ฯตtโˆ’1+ฮธ2ฯตtโˆ’2+...+ฮธqฯตtโˆ’qy_t = \theta_{0} + \theta_{1} y_{t-1} + \theta_{2} y_{t-2} + ... + \theta_{p} y_{t-p} + \epsilon_{t} + \theta_{1} \epsilon_{t-1} + \theta_{2} \epsilon_{t-2} +...+ \theta_{q} \epsilon_{t-q}

5. Autoregressive Integrated Moving Average (ARIMA)

๊ธฐ์กด AR, MA, ARMA ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์ •์ƒ (Stationary)์ด์–ด์•ผ ํ•จ์œผ๋กœ ๋น„์ •์ƒ (Nonstationary)์ธ ๊ฒฝ์šฐ๋Š” ์ฐจ๋ถ„ (differencing)์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ •์ƒ์œผ๋กœ ๋ณ€ํ˜•ํ•ด์ฃผ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ARIMA๋Š” ARMA ๋ชจํ˜•์— ์ฐจ๋ถ„์„ dํšŒ ์ˆ˜ํ–‰ํ•ด์ค€ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ๋ฅผ ์ •์ƒ์œผ๋กœ ๋ฐ”๊พธ๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ผ๊นŒ? - ์ฐจ๋ถ„(Differencing)
์ฐจ๋ถ„์ด๋ž€, ํ˜„ ์‹œ์  ๋ฐ์ดํ„ฐ์—์„œ d์‹œ์  ์ด์ „ ๋ฐ์ดํ„ฐ๋ฅผ ๋บ€ ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ •์ƒ์„ฑ์„ ๋‚˜ํƒ€๋‚ด์ง€ ์•Š๋Š” ์‹œ๊ณ„์—ด์„ ์ •์ƒ์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋„๋ก ๋งŒ๋“œ๋Š” ํ•œ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ์—ฐ์ด์€ ๊ด€์ธก๊ฐ’์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ๋ฐ์ดํ„ฐ๊ฐ€ ์ •์ƒ์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋„๋ก ๋ณ€ํ™”์‹œํ‚ต๋‹ˆ๋‹ค.

์œ„ ๊ทธ๋ฆผ์€ ์ฐจ๋ถ„์ด ์–ด๋–ป๊ฒŒ ์ผ์–ด๋‚˜๋Š”์ง€ ๋‚˜ํƒ€๋‚ธ ๊ทธ๋ฆผ์ž…๋‹ˆ๋‹ค. ์‹œ์ฐจ 1์—์„œ ์ฐจ๋ถ„์„ ๊ตฌํ•˜๋Š” ๊ฒฝ์šฐ "1์ฐจ ์ฐจ๋ถ„(first difference)" ์ด๋ผ๊ณ  ๋ถ€๋ฅด๋ฉฐ, ์‹œ์ฐจ 2์—์„œ ์ฐจ๋ถ„์„ ๊ตฌํ•˜๋Š” ๊ฒฝ์šฐ "2์ฐจ ์ฐจ๋ถ„(second difference)" ๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค. 1์ฐจ ์ฐจ๋ถ„์„ ์ง„ํ–‰ํ–ˆ์Œ์—๋„ ์ •์ƒ์„ฑ์„ ๋‚˜ํƒ€๋‚ด์ง€ ์•Š๋Š” ๊ฒฝ์šฐ 2์ฐจ ์ฐจ๋ถ„์„ ์ง„ํ–‰ํ•˜๊ฒŒ ๋˜์ง€๋งŒ 2์ฐจ ์ฐจ๋ถ„์˜ ์˜๋ฏธ์ƒ ์›๋ณธ ๋ฐ์ดํ„ฐ์˜ "๋ณ€ํ™”์—์„œ ๋‚˜ํƒ€๋‚˜๋Š” ๋ณ€ํ™”"๋ฅผ ๋ชจ๋ธ๋งํ•˜๊ฒŒ ๋˜๋Š” ์…ˆ์ด์–ด์„œ ์‹ค์ œ ์ƒํ™ฉ์—์„œ๋Š” 2์ฐจ ์ฐจ๋ถ„ ์ด์ƒ์œผ๋กœ ๊ตฌํ•ด์•ผํ•˜๋Š” ๊ฒฝ์šฐ๋Š” ๊ฑฐ์˜ ์ผ์–ด๋‚˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์œ„์˜ ๊ทธ๋ฆผ์€ ๋กœ๊ทธ ๋ณ€ํ™˜, 1์ฐจ ์ฐจ๋ถ„, 2์ฐจ ์ฐจ๋ถ„ ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”ํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ์‹œ๊ณ„์—ด ๊ณก์„ ์ด ํŠน์ •ํ•œ ํŠธ๋ Œ๋“œ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋ฉด 1์ฐจ ์ฐจ๋ถ„์„, ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ณ€ํ™”ํ•˜๋Š” ํŠธ๋ Œ๋“œ๊ฐ€ ์žˆ๋‹ค๋ฉด 2์ฐจ ์ฐจ๋ถ„์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

ARIMA๋Š” Autoregressive Integrated Moving Average์˜ ์•ฝ์ž๋กœ ์ด๋™ ํ‰๊ท ์„ ๋ˆ„์ ํ•œ ์ž๊ธฐ ํšŒ๊ท€ ์ฆ‰ ์ž๊ธฐ ํšŒ๊ท€์™€ ์ด๋™ ํ‰๊ท  ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ˆ˜์‹์€ ์•„๋ž˜์™€ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ytโ€ฒ=c+ฮฆ1ytโˆ’1โ€ฒ+ฮฆ2ytโˆ’2โ€ฒ+...+ฮฆpytโˆ’pโ€ฒ+ฮธ1ฯตtโˆ’1+ฮธ2ฯตtโˆ’2+...+ฮธqฯตtโˆ’q+ฯตty_t' = c+ \Phi_{1} y_{t-1}' + \Phi_{2} y_{t-2}' + ... + \Phi_{p} y_{t-p}' + \theta_{1} \epsilon_{t-1} + \theta_{2} \epsilon_{t-2} +...+ \theta_{q} \epsilon_{t-q} + \epsilon_t

์œ„์˜ ์‹์—์„œ ytโ€ฒy_t'

  • pp = ์ž๊ธฐ ํšŒ๊ท€ ๋ถ€๋ถ„์˜ ์ฐจ์ˆ˜
  • dd = 1์ฐจ ์ฐจ๋ถ„์ด ํฌํ•จ๋œ ์ •๋„
  • qq = ์ด๋™ ํ‰๊ท  ๋ถ€๋ถ„์˜ ์ฐจ์ˆ˜

์ž๊ธฐ ํšŒ๊ท€(AR)๊ณผ ์ด๋™ ํ‰๊ท  ๋ชจ๋ธ(MA)์— ์‚ฌ์šฉ๋˜๋Š” ๊ฒƒ๊ณผ ๊ฐ™์€ ์ •์ƒ์„ฑ๊ณผ ๊ฐ€์—ญ์„ฑ ์กฐ๊ฑด์€ ARIMA ๋ชจ๋ธ์—๋„ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ง€๊ธˆ๊นŒ์ง€ ๋‹ค๋ฃฌ ๋ชจ๋ธ์„ ARIMA ๋ชจ๋ธ๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

  • White Noise : ARIMA(0,0,0)
  • ํ™•๋ฅ  ๋ณดํ–‰ : ์ƒ์ˆ˜๊ฐ€ ์—†๋Š” ARIMA(0,1,0)
  • ํ‘œ๋ฅ˜๋ฅผ ํฌํ•จํ•˜๋Š” ํ™•๋ฅ ๋ณดํ–‰ : ์ƒ์ˆ˜๊ฐ€ ์žˆ๋Š” ARIMA(0,1,0)
  • AR : ARIMA(p,0,0)
  • MA : ARIMA(0,0,q)

6. ACF and PACF

ACF(AutoCorrelation Function)?

ACF(AutoCorrelation Function, ์ž๊ธฐ์ƒ๊ด€ํ•จ์ˆ˜) ๋Š” k์‹œ๊ฐ„ ๋‹จ์œ„๋กœ ๊ตฌ๋ถ„๋œ ์‹œ๊ณ„์—ด์˜ ๊ด€์ธก์น˜ ๊ฐ„ ์ƒ๊ด€๊ณ„์ˆ˜ ํ•จ์ˆ˜๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ, k๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ACF๋Š” 0์— ๊ฐ€๊นŒ์›Œ์ง‘๋‹ˆ๋‹ค. ์ƒ๊ด€๊ฐ’์ด ๋‘ ๋ณ€์ˆ˜ ์‚ฌ์ด์˜ ์„ ํ˜• ๊ด€๊ณ„์˜ ํฌ๊ธฐ๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, ์ž๊ธฐ์ƒ๊ด€(Autocorrelation)์€ ์‹œ๊ณ„์—ด์˜ ์‚ฌ์ฐจ๊ฐ’(lagged values) ์‚ฌ์ด์˜ ์„ ํ˜• ๊ด€๊ณ„๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.

์‹œ์ฐจ ๊ทธ๋ž˜ํ”„์—์„œ ๊ฐ ํŒจ๋„๊ณผ ๊ด€๋ จ๋œ ๋ช‡๊ฐ€์ง€ ์ž๊ธฐ์ƒ๊ด€ ๊ณ„์ˆ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. r1r_1

rkr_k

PACF(Partial ACF)?

๋ถ€๋ถ„ ์ƒ๊ด€(Partial Correlation) ์ด๋ž€ ๋‘ ํ™•๋ฅ ๋ณ€์ˆ˜ X์™€ Y์— ์˜ํ•ด ๋‹ค๋ฅธ ๋ชจ๋“  ๋ณ€์ˆ˜๋“ค์— ๋‚˜ํƒ€๋‚œ ์ƒ๊ด€ ๊ด€๊ณ„๋ฅผ ์„ค๋ช…ํ•˜๊ณ  ๋‚œ ์ดํ›„์—๋„ ์—ฌ์ „ํžˆ ๋‚จ์•„์žˆ๋Š” ์ƒ๊ด€ ๊ด€๊ณ„๋ผ๊ณ  ์ •์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ๋ถ€๋ถ„์ž๊ธฐ์ƒ๊ด€ํ•จ์ˆ˜(PACF) ๋Š” ์ž๊ธฐ ์ƒ๊ด€ ํ•จ์ˆ˜์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์‹œ๊ณ„์—ด ๊ด€์ธก์น˜ ๊ฐ„ ์ƒ๊ด€ ๊ด€๊ณ„ ํ•จ์ˆ˜์ด๊ณ , ์‹œ์ฐจ k์—์„œ์˜ k ๋‹จ๊ณ„๋งŒํผ ๋–จ์–ด์ ธ ์žˆ๋Š” ๋ชจ๋“  ๋ฐ์ดํ„ฐ ์ ๋“ค ๊ฐ„์˜ ์ˆœ์ˆ˜ํ•œ ์ƒ๊ด€ ๊ด€๊ณ„๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

์‰ฝ๊ฒŒ ๋งํ•ด, yty_t

PACF(k)=Corr(et,etโˆ’k)PACF(k) = Corr(e_{t}, e_{t-k})

ACF์™€ PACF๋ฅผ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉํ•˜๋Š”๊ฐ€?

๋ณดํ†ต์€ ๋‹จ์ˆœํ•˜๊ฒŒ ์‹œ๊ฐ„ ๊ทธ๋ž˜ํ”„(Time Plot)๋งŒ ๋ณด๊ณ  ๋‚˜์„œ ์–ด๋–ค p์™€ q ๊ฐ’์ด ๋ฐ์ดํ„ฐ์— ๋งž๋Š”์ง€ ์ด์•ผ๊ธฐํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ARIMA ๋ชจ๋ธ์—์„œ ์ ์ ˆํ•œ p์™€ q ๊ฐ’์„ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•ด ๋•Œ๋•Œ๋กœ ACF ๊ทธ๋ž˜ํ”„์™€ PACF ๊ทธ๋ž˜ํ”„๋ฅผ ์ด์šฉํ•˜๋ฉด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

์„œ๋กœ ๋‹ค๋ฅธ k ๊ฐ’์— ๋Œ€ํ•ด, yty_t

์œ„์™€ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด PACF ๊ทธ๋ž˜ํ”„๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฐ’์€ ์‹œ์ฐจ 1,2,3,...,kโˆ’11, 2, 3, ..., k-1

ACF์™€ PACF์˜ ๋ชจ์–‘์„ ํ†ตํ•ด ARIMA ๋ชจ๋ธ์˜ ๋งค๊ฐœ ๋ณ€์ˆ˜์ธ p์™€ q๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

.
.
.
๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

์ข‹์€ ์›นํŽ˜์ด์ง€ ์ฆ๊ฒจ์ฐพ๊ธฐ