最完整的時(shí)間序列分析和預(yù)測(cè)（含實(shí)例及代碼）

dxgung 2022-02-28

展開全文

時(shí)間序列

在生產(chǎn)和科學(xué)研究中，對(duì)某一個(gè)或者一組變量進(jìn)行觀察測(cè)量，將在一系列時(shí)刻所得到的離散數(shù)字組成的序列集合，稱之為時(shí)間序列。

pandas生成時(shí)間序列
過濾數(shù)據(jù)
重采樣
插值
滑窗
數(shù)據(jù)平穩(wěn)性與差分法

pandas生成時(shí)間序列

時(shí)間戳（timestamp）
固定周期（period）
時(shí)間間隔（interval）

import pandas as pd
import numpy as np

# TIMES的幾種書寫方式 #2016 Jul 1; 7/1/2016; 1/7/2016 ;2016-07-01; 2016/07/01
rng = pd.date_range('2016-07-01', periods = 10, freq = '3D')#不傳freq則默認(rèn)是D
rng

DatetimeIndex(['2016-07-01', '2016-07-04', '2016-07-07', '2016-07-10',
               '2016-07-13', '2016-07-16', '2016-07-19', '2016-07-22',
               '2016-07-25', '2016-07-28'],
              dtype='datetime64[ns]', freq='3D')

time=pd.Series(np.random.randn(20),index=pd.date_range('2016-01-01',periods=20))
print(time)

2016-01-01   -1.503070
2016-01-02    1.637771
2016-01-03   -1.527274
2016-01-04    1.202349
2016-01-05   -1.214471
2016-01-06    2.686539
2016-01-07   -0.665813
2016-01-08    1.210834
2016-01-09    0.973659
2016-01-10   -1.003532
2016-01-11   -0.138483
2016-01-12    0.718561
2016-01-13    1.380494
2016-01-14    0.368590
2016-01-15   -0.235975
2016-01-16   -0.847375
2016-01-17   -1.777034
2016-01-18    1.976097
2016-01-19   -0.631212
2016-01-20   -0.613633
Freq: D, dtype: float64

truncate過濾

time.truncate(before='2016-1-10')#1月10之前的都被過濾掉了
time.truncate(after='2016-1-10')#1月10之前的都被過濾掉了

2016-01-01   -1.503070
2016-01-02    1.637771
2016-01-03   -1.527274
2016-01-04    1.202349
2016-01-05   -1.214471
2016-01-06    2.686539
2016-01-07   -0.665813
2016-01-08    1.210834
2016-01-09    0.973659
2016-01-10   -1.003532
Freq: D, dtype: float64

數(shù)據(jù)重采樣

時(shí)間數(shù)據(jù)由一個(gè)頻率轉(zhuǎn)換到另一個(gè)頻率
降采樣
升采樣

import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2011', periods=90, freq='D')#數(shù)據(jù)按天
ts = pd.Series(np.random.randn(len(rng)), index=rng)

ts.resample('M').sum()#數(shù)據(jù)降采樣，降為月，指標(biāo)是求和，也可以平均，自己指定

ts.resample('3D').sum()#數(shù)據(jù)降采樣，降為3天

day3Ts = ts.resample('3D').mean()
day3Ts

print(day3Ts.resample('D').asfreq())#升采樣，要進(jìn)行插值

插值方法：

ffill 空值取前面的值
bfill 空值取后面的值
interpolate 線性取值

day3Ts.resample('D').ffill(1)

2011-01-01    0.196793
2011-01-02    0.196793
2011-01-03         NaN
2011-01-04   -0.145891
2011-01-05   -0.145891
                ...   
2011-03-25         NaN
2011-03-26   -0.993341
2011-03-27   -0.993341
2011-03-28         NaN
2011-03-29   -0.022786
Freq: D, Length: 88, dtype: float64

day3Ts.resample('D').bfill(1)

2011-01-01    0.196793
2011-01-02         NaN
2011-01-03   -0.145891
2011-01-04   -0.145891
2011-01-05         NaN
                ...   
2011-03-25   -0.993341
2011-03-26   -0.993341
2011-03-27         NaN
2011-03-28   -0.022786
2011-03-29   -0.022786
Freq: D, Length: 88, dtype: float64

day3Ts.resample('D').interpolate('linear')#線性擬合填充

2011-01-01    0.196793
2011-01-02    0.082565
2011-01-03   -0.031663
2011-01-04   -0.145891
2011-01-05   -0.196231
                ...   
2011-03-25   -0.771202
2011-03-26   -0.993341
2011-03-27   -0.669823
2011-03-28   -0.346305
2011-03-29   -0.022786
Freq: D, Length: 88, dtype: float64

Pandas滑動(dòng)窗口：

滑動(dòng)窗口就是能夠根據(jù)指定的單位長(zhǎng)度來框住時(shí)間序列，從而計(jì)算框內(nèi)的統(tǒng)計(jì)指標(biāo)。
相當(dāng)于一個(gè)長(zhǎng)度指定的滑塊在刻度尺上面滑動(dòng)，每滑動(dòng)一個(gè)單位即可反饋滑塊內(nèi)的數(shù)據(jù)。
滑動(dòng)窗口可以使數(shù)據(jù)更加平穩(wěn)，浮動(dòng)范圍會(huì)比較小，具有代表性，單獨(dú)拿出一個(gè)數(shù)據(jù)可能或多或少會(huì)離群，有差異或者錯(cuò)誤，使用滑動(dòng)窗口會(huì)更規(guī)范一些。

%matplotlib inline
import matplotlib.pylab
import numpy as np
import pandas as pd
df = pd.Series(np.random.randn(600), index = pd.date_range('7/1/2016', freq = 'D', periods = 600))
df.head()

2016-07-01    0.391383
2016-07-02    1.529039
2016-07-03   -0.807703
2016-07-04    0.770088
2016-07-05    0.476651
Freq: D, dtype: float64

r = df.rolling(window = 10)
#r.max, r.median, r.std, r.skew傾斜度, r.sum, r.var
print(r.mean())

2016-07-01         NaN
2016-07-02         NaN
2016-07-03         NaN
2016-07-04         NaN
2016-07-05         NaN
                ...   
2018-02-16    0.262464
2018-02-17    0.114787
2018-02-18    0.088134
2018-02-19    0.011999
2018-02-20    0.190583
Freq: D, Length: 600, dtype: float64

import matplotlib.pyplot as plt
%matplotlib inline
 
plt.figure(figsize=(15, 5))
 
df[:].plot(style='r--')
df[:].rolling(window=10).mean().plot(style='b')

數(shù)據(jù)平穩(wěn)性與差分法：

基本模型：自回歸移動(dòng)平均模型(ARMA(p，q))是時(shí)間序列中最為重要的模型之一。
它主要由兩部分組成：AR代表p階自回歸過程，MA代表q階移動(dòng)平均過程。

平穩(wěn)性

要求經(jīng)由時(shí)間序列所得到的的擬合曲線在未來一段時(shí)間內(nèi)仍能順著現(xiàn)有形態(tài)'慣性’延續(xù)下去
即均值和方差不發(fā)生明顯變化
ARIMA 模型對(duì)時(shí)間序列的要求是平穩(wěn)型。
因此，當(dāng)你得到一個(gè)非平穩(wěn)的時(shí)間序列時(shí)，首先要做的即是做時(shí)間序列的差分，直到得到一個(gè)平穩(wěn)時(shí)間序列。
如果你對(duì)時(shí)間序列做d次差分才能得到一個(gè)平穩(wěn)序列，那么可以使用ARIMA(p,d,q)模型，其中d是差分次數(shù)

ARIMA(p,d,q)

當(dāng)數(shù)據(jù)差異特別大時(shí)，為了使數(shù)據(jù)變得平穩(wěn)些，可以使用差分法
即時(shí)間序列在t與t-1時(shí)刻的差值
二階差分是指在一階差分基礎(chǔ)上再做一階差分。

%matplotlib inline
import matplotlib.pylab
import numpy as np
import pandas as pd
df = pd.Series(np.random.randn(100), index = pd.date_range('7/1/2016', freq = 'D', periods = 100))
df.head()

2016-07-01   -0.451037
2016-07-02   -1.075953
2016-07-03    0.573926
2016-07-04   -1.643342
2016-07-05   -0.716047
Freq: D, dtype: float64

df.shift(-1) -df

2016-07-01   -0.624916
2016-07-02    1.649879
2016-07-03   -2.217268
2016-07-04    0.927295
2016-07-05    0.127485

df.diff(2)

2016-07-01         NaN
2016-07-02         NaN
2016-07-03    1.024963
2016-07-04   -0.567389
2016-07-05   -1.289973

import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei'] #中文支持
plt.rcParams['axes.unicode_minus'] = False #正常顯示負(fù)號(hào)
x = df.index
y = df
plt.figure(figsize=(15,6))
plt.plot(x,y)
plt.title('原數(shù)據(jù)')

newx = df.index
y = df.diff(1)
plt.figure(figsize=(15,6))
plt.plot(x,y,label = '一階')
plt.title('一二階差分')
y = y.diff(1)
plt.plot(x,y,label = '二階')
plt.legend()

自回歸 AR

用自身變量的歷史時(shí)間對(duì)自己預(yù)測(cè)
自回歸模型必須滿足平穩(wěn)性（可以使用差分）
p階自回歸過程公式： y = u + 求和a*y(t-i) + e
y 是當(dāng)前值， u是常數(shù)項(xiàng)， e 是誤差項(xiàng)（服從獨(dú)立同分布） y(t-i)當(dāng)前預(yù)測(cè)的值與前P天相關(guān) ,a是自相關(guān)系數(shù)

自回歸模型限制

用自身來預(yù)測(cè)
平穩(wěn)性
自相關(guān)性判斷自相關(guān)系數(shù)??！
只適用于預(yù)測(cè)與自身前期相關(guān)的現(xiàn)象

移動(dòng)平均模型（MA）

關(guān)注自回歸模型中的誤差項(xiàng)的累加
q階自回歸過程的定義: y = u + e + b*e(t-i)
移動(dòng)平均能有效消除預(yù)測(cè)中的隨機(jī)波動(dòng)

ARIMA

I表示差分項(xiàng)，1是一階，0是不用做，一般做1階就夠了
原理:將非平穩(wěn)時(shí)間序列轉(zhuǎn)化為平穩(wěn)時(shí)間序列，然后將隱變量?jī)H對(duì)它的滯后值以及隨機(jī)誤差項(xiàng)的現(xiàn)值和滯后值進(jìn)行回歸所建立的模型。（滯后指階數(shù)）

自相關(guān)函數(shù)ACF

有序的隨機(jī)變量與其自身相比較
ACF反映了同一序列在不同時(shí)序的取值之間的相關(guān)性
ACF(k) = cov（y(t),y(t-k)）/var(y(t)) [-1,1]

如何確定 pq參數(shù)？

利用ACF 和 PCAF

實(shí)例操作

主要分為4部分

用pandas處理時(shí)序數(shù)據(jù)
檢驗(yàn)序數(shù)據(jù)的穩(wěn)定性
處理時(shí)序數(shù)據(jù)變成穩(wěn)定數(shù)據(jù)
時(shí)序數(shù)據(jù)的預(yù)測(cè)

1 用pandas導(dǎo)入和處理時(shí)序數(shù)據(jù)

數(shù)據(jù)集是：航空乘客數(shù)量預(yù)測(cè)例子數(shù)據(jù)集international-airline-passengers.csv

網(wǎng)上一大推：下載地址：https://github.com/sunlei-1997/ML-DL-datasets/blob/master/international-airline-passengers.csv

import numpy as np
import pandas as pd
from datetime import datetime
import matplotlib.pylab as plt
import tqdm
import statsmodels
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_model import ARIMA
import warnings
warnings.filterwarnings('ignore')
    
# 讀取數(shù)據(jù)，pd.read_csv默認(rèn)生成DataFrame對(duì)象，需將其轉(zhuǎn)換成Series對(duì)象
df = pd.read_csv('international-airline-passengers.csv', encoding='utf-8', index_col='Month')
df.index = pd.to_datetime(df.index)  # 將字符串索引轉(zhuǎn)換成時(shí)間索引
ts = df['Passengers']  # 生成pd.Series對(duì)象
ts = ts.astype('float')
ts.head()

Month
1949-01-01    112.0
1949-02-01    118.0
1949-03-01    132.0
1949-04-01    129.0
1949-05-01    121.0
Name: Passengers, dtype: float64

ts.index

DatetimeIndex(['1949-01-01', '1949-02-01', '1949-03-01', '1949-04-01',
               '1949-05-01', '1949-06-01', '1949-07-01', '1949-08-01',
               '1949-09-01', '1949-10-01',
               ...
               '1960-03-01', '1960-04-01', '1960-05-01', '1960-06-01',
               '1960-07-01', '1960-08-01', '1960-09-01', '1960-10-01',
               '1960-11-01', '1960-12-01'],
              dtype='datetime64[ns]', name='Month', length=144, freq=None)

ts['1949-01-01']

112.0

ts[datetime(1949,1,1)]

112.0

ts['1949-1' : '1949-6']

Month
1949-01-01    112.0
1949-02-01    118.0
1949-03-01    132.0
1949-04-01    129.0
1949-05-01    121.0
1949-06-01    135.0
Name: Passengers, dtype: float64

2 檢驗(yàn)序數(shù)據(jù)的穩(wěn)定性

因?yàn)锳RIMA模型要求數(shù)據(jù)是穩(wěn)定的，所以這一步至關(guān)重要。

2.1 判斷數(shù)據(jù)是穩(wěn)定的常基于對(duì)于時(shí)間是常量的幾個(gè)統(tǒng)計(jì)量：

常量的均值
常量的方差
與時(shí)間獨(dú)立的自協(xié)方差

2.2 python判斷時(shí)序數(shù)據(jù)穩(wěn)定

平穩(wěn)性檢驗(yàn)一般采用觀察法和單位根檢驗(yàn)法。

觀察法：需計(jì)算每個(gè)時(shí)間段內(nèi)的平均的數(shù)據(jù)均值和標(biāo)準(zhǔn)差。

單位根檢驗(yàn)法：通過Dickey-Fuller Test 進(jìn)行判斷，大致意思就是在一定置信水平下，對(duì)于時(shí)序數(shù)據(jù)假設(shè) Null hypothesis: 非穩(wěn)定。這是一種常用的單位根檢驗(yàn)方法，它的原假設(shè)為序列具有單位根，即非平穩(wěn)，對(duì)于一個(gè)平穩(wěn)的時(shí)序數(shù)據(jù)，就需要在給定的置信水平上顯著，拒絕原假設(shè)。

# 移動(dòng)平均圖
def draw_trend(timeseries, size):
    f = plt.figure(facecolor='white')
    # 對(duì)size個(gè)數(shù)據(jù)進(jìn)行移動(dòng)平均
    rol_mean = timeseries.rolling(window=size).mean()
    # 對(duì)size個(gè)數(shù)據(jù)移動(dòng)平均的方差
    rol_std = timeseries.rolling(window=size).std()
 
    timeseries.plot(color='blue', label='Original')
    rol_mean.plot(color='red', label='Rolling Mean')
    rol_std.plot(color='black', label='Rolling standard deviation')
    plt.legend(loc='best')
    plt.title('Rolling Mean & Standard Deviation')
    plt.show()

def draw_ts(timeseries):
    f = plt.figure(facecolor='white')
    timeseries.plot(color='blue')
    plt.show()

#Dickey-Fuller test:
def teststationarity(ts,max_lag = None):
    dftest = statsmodels.tsa.stattools.adfuller(ts,maxlag= max_lag)
    # 對(duì)上述函數(shù)求得的值進(jìn)行語義描述
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value
    return dfoutput

#查看原始數(shù)據(jù)的均值和方差
draw_trend(ts,12)

通過上圖，我們可以發(fā)現(xiàn)數(shù)據(jù)的移動(dòng)平均值/標(biāo)準(zhǔn)差有越來越大的趨勢(shì)，是不穩(wěn)定的。接下來我們?cè)倏碊ickey-Fuller的結(jié)果

teststationarity(ts)

Test Statistic                   0.815369
p-value                          0.991880
#Lags Used                      13.000000
Number of Observations Used    130.000000
Critical Value (1%)             -3.481682
Critical Value (5%)             -2.884042
Critical Value (10%)            -2.578770
dtype: float64

此時(shí)p值為0.991880，說明并不能拒絕原假設(shè)。通過DF的數(shù)據(jù)可以明確的看出，在任何置信度下，數(shù)據(jù)都不是穩(wěn)定的。

3 處理時(shí)序數(shù)據(jù)變成穩(wěn)定數(shù)據(jù)

數(shù)據(jù)不穩(wěn)定的原因主要有以下兩點(diǎn)：

趨勢(shì)（trend）-數(shù)據(jù)隨著時(shí)間變化。比如說升高或者降低。
季節(jié)性(seasonality)-數(shù)據(jù)在特定的時(shí)間段內(nèi)變動(dòng)。比如說節(jié)假日，或者活動(dòng)導(dǎo)致數(shù)據(jù)的異常。

3.1 對(duì)數(shù)變換

對(duì)數(shù)變換主要是為了減小數(shù)據(jù)的振動(dòng)幅度，使其線性規(guī)律更加明顯，同時(shí)保留其他信息。這里強(qiáng)調(diào)一下，變換的序列需要滿足大于0，小于0的數(shù)據(jù)不存在對(duì)數(shù)變換。

ts_log = np.log(ts)
draw_trend(ts_log,12)

可以看出經(jīng)過對(duì)數(shù)變換后，數(shù)據(jù)值域范圍縮小了，振幅也沒那么大了。

3.2 平滑法

根據(jù)平滑技術(shù)的不同，平滑法具體分為移動(dòng)平均法和指數(shù)平均法。

移動(dòng)平均即利用一定時(shí)間間隔內(nèi)的平均值作為某一期的估計(jì)值，而指數(shù)平均則是用變權(quán)的方法來計(jì)算均值。

移動(dòng)平均：

def draw_moving(timeSeries, size):
    f = plt.figure(facecolor='white')
    # 對(duì)size個(gè)數(shù)據(jù)進(jìn)行移動(dòng)平均
    rol_mean = timeSeries.rolling(window=size).mean()
    # 對(duì)size個(gè)數(shù)據(jù)進(jìn)行加權(quán)移動(dòng)平均
    rol_weighted_mean = pd.Series.ewm(timeSeries, span=size)
    rol_weighted_mean=timeSeries.ewm(halflife=size,min_periods=0,adjust=True,ignore_na=False).mean()
 
    timeSeries.plot(color='blue', label='Original')
    rol_mean.plot(color='red', label='Rolling Mean')
    rol_weighted_mean.plot(color='black', label='Weighted Rolling Mean')
    plt.legend(loc='best')
    plt.title('Rolling Mean')
    plt.show()
draw_moving(ts_log,12)

從上圖可以發(fā)現(xiàn)窗口為12的移動(dòng)平均能較好的剔除年周期性因素，

而指數(shù)平均法是對(duì)周期內(nèi)的數(shù)據(jù)進(jìn)行了加權(quán)，能在一定程度上減小年周期因素，但并不能完全剔除，如要完全剔除可以進(jìn)一步進(jìn)行差分操作。

3.3 差分

時(shí)間序列最常用來剔除周期性因素的方法當(dāng)屬差分了，它主要是對(duì)等周期間隔的數(shù)據(jù)進(jìn)行線性求減。
ARIMA模型相對(duì)ARMA模型，僅多了差分操作，ARIMA模型幾乎是所有時(shí)間序列軟件都支持的，差分的實(shí)現(xiàn)與還原都非常方便。

diff_12 = ts_log.diff(12)
diff_12.dropna(inplace=True)
diff_12_1 = diff_12.diff(1)
diff_12_1.dropna(inplace=True)
teststationarity(diff_12_1)

Test Statistic                  -4.443325
p-value                          0.000249
#Lags Used                      12.000000
Number of Observations Used    118.000000
Critical Value (1%)             -3.487022
Critical Value (5%)             -2.886363
Critical Value (10%)            -2.580009
dtype: float64

從上面的統(tǒng)計(jì)檢驗(yàn)結(jié)果可以看出，經(jīng)過12階滑動(dòng)平均和1階差分后，該序列滿足平穩(wěn)性的要求了。

3.4 分解

所謂分解就是將時(shí)序數(shù)據(jù)分離成不同的成分。
statsmodels使用的X-11分解過程，它主要將時(shí)序數(shù)據(jù)分離成長(zhǎng)期趨勢(shì)、季節(jié)趨勢(shì)和隨機(jī)成分。
與其它統(tǒng)計(jì)軟件一樣，statsmodels也支持兩類分解模型，加法模型和乘法模型，這里我只實(shí)現(xiàn)加法，乘法只需將model的參數(shù)設(shè)置為'multiplicative'即可。

from statsmodels.tsa.seasonal import seasonal_decompose
def decompose(timeseries):
    
    # 返回包含三個(gè)部分 trend（趨勢(shì)部分） ， seasonal（季節(jié)性部分） 和residual (殘留部分)
    decomposition = seasonal_decompose(timeseries)
    
    trend = decomposition.trend
    seasonal = decomposition.seasonal
    residual = decomposition.resid
    
    plt.subplot(411)
    plt.plot(ts_log, label='Original')
    plt.legend(loc='best')
    plt.subplot(412)
    plt.plot(trend, label='Trend')
    plt.legend(loc='best')
    plt.subplot(413)
    plt.plot(seasonal,label='Seasonality')
    plt.legend(loc='best')
    plt.subplot(414)
    plt.plot(residual, label='Residuals')
    plt.legend(loc='best')
    plt.tight_layout()
    plt.show()
    return trend , seasonal, residual
trend , seasonal, residual = decompose(ts_log)
residual.dropna(inplace=True)
draw_trend(residual,12)
teststationarity(residual)

Test Statistic                -6.332387e+00
p-value                        2.885059e-08
#Lags Used                     9.000000e+00
Number of Observations Used    1.220000e+02
Critical Value (1%)           -3.485122e+00
Critical Value (5%)           -2.885538e+00
Critical Value (10%)          -2.579569e+00
dtype: float64

如圖所示，數(shù)據(jù)的均值和方差趨于常數(shù)，幾乎無波動(dòng)(看上去比之前的陡峭，但是要注意他的值域只有[-0.05,0.05]之間)
所以直觀上可以認(rèn)為是穩(wěn)定的數(shù)據(jù)。另外DFtest的結(jié)果顯示，Statistic值原小于1%時(shí)的Critical value，所以在99%的置信度下，數(shù)據(jù)是穩(wěn)定的。

4 時(shí)序數(shù)據(jù)的預(yù)測(cè)

在前面的分析可知，該序列具有明顯的年周期與長(zhǎng)期成分。
對(duì)于年周期成分我們使用窗口為12的移動(dòng)平進(jìn)行處理，對(duì)于長(zhǎng)期趨勢(shì)成分我們采用1階差分來進(jìn)行處理。

rol_mean = ts_log.rolling(window=12).mean()
rol_mean.dropna(inplace=True)
ts_diff_1 = rol_mean.diff(1)
ts_diff_1.dropna(inplace=True)
teststationarity(ts_diff_1)

Test Statistic                  -2.709577
p-value                          0.072396
#Lags Used                      12.000000
Number of Observations Used    119.000000
Critical Value (1%)             -3.486535
Critical Value (5%)             -2.886151
Critical Value (10%)            -2.579896
dtype: float64

觀察其統(tǒng)計(jì)量發(fā)現(xiàn)該序列在置信水平為95%的區(qū)間下并不顯著，我們對(duì)其進(jìn)行再次一階差分。
再次差分后的序列其自相關(guān)具有快速衰減的特點(diǎn)，t統(tǒng)計(jì)量在99%的置信水平下是顯著的，這里我不再做詳細(xì)說明。

ts_diff_2 = ts_diff_1.diff(1)
ts_diff_2.dropna(inplace=True)
teststationarity(ts_diff_2)

Test Statistic                  -4.443325
p-value                          0.000249
#Lags Used                      12.000000
Number of Observations Used    118.000000
Critical Value (1%)             -3.487022
Critical Value (5%)             -2.886363
Critical Value (10%)            -2.580009
dtype: float64

數(shù)據(jù)平穩(wěn)后，需要對(duì)模型定階，即確定p、q的階數(shù)。先畫出ACF,PACF的圖像,代碼如下：

def draw_acf_pacf(ts,lags):
    f = plt.figure(facecolor='white')
    ax1 = f.add_subplot(211)
    plot_acf(ts,ax=ax1,lags=lags)
    ax2 = f.add_subplot(212)
    plot_pacf(ts,ax=ax2,lags=lags)
    plt.subplots_adjust(hspace=0.5)
    plt.show()
draw_acf_pacf(ts_diff_2,30)

觀察上圖，發(fā)現(xiàn)自相關(guān)和偏相系數(shù)都存在拖尾的特點(diǎn)，并且他們都具有明顯的一階相關(guān)性，所以我們?cè)O(shè)定p=1, q=1。
下面就可以使用ARMA模型進(jìn)行數(shù)據(jù)擬合了。（Ps.PACF是判定AR模型階數(shù)的，也就是p。ACF是判斷MA階數(shù)的，也就是q）

model = ARIMA(ts_diff_1, order=(1,1,1)) 
result_arima = model.fit( disp=-1, method='css')

predict_data=result_arima.predict(15,150)                # 擬合+預(yù)測(cè)0~150數(shù)據(jù)
forecast=result_arima.forecast(21)                      # 預(yù)測(cè)未來21天數(shù)據(jù)

模型擬合完后，我們就可以對(duì)其進(jìn)行預(yù)測(cè)了。由于ARMA擬合的是經(jīng)過相關(guān)預(yù)處理后的數(shù)據(jù)，故其預(yù)測(cè)值需要通過相關(guān)逆變換進(jìn)行還原。

predict_ts = result_arima.predict()
# 一階差分還原
diff_shift_ts = ts_diff_1.shift(1)
diff_recover_1 = predict_ts.add(diff_shift_ts)

# 再次一階差分還原
rol_shift_ts = rol_mean.shift(1)
diff_recover = diff_recover_1.add(rol_shift_ts)

# 移動(dòng)平均還原
rol_sum = ts_log.rolling(window=11).sum()
rol_recover = diff_recover*12 - rol_sum.shift(1)

# 對(duì)數(shù)還原
log_recover = np.exp(rol_recover)
log_recover.dropna(inplace=True)

我們使用均方根誤差（RMSE）來評(píng)估模型樣本內(nèi)擬合的好壞。利用該準(zhǔn)則進(jìn)行判別時(shí)，需要剔除“非預(yù)測(cè)”數(shù)據(jù)的影響。

ts_ = ts[log_recover.index]  # 過濾沒有預(yù)測(cè)的記錄plt.figure(facecolor='white')
log_recover.plot(color='blue', label='Predict')
ts_.plot(color='red', label='Original')
plt.legend(loc='best')
plt.title('RMSE: %.4f'% np.sqrt(sum((log_recover-ts[14:])**2)/ts.size))
plt.show()

END

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自： dxgung > 《待分類》

舉報(bào)/認(rèn)領(lǐng)