LSTM时间序列预测中的一个常见错误以及如何修正

deephub 2024-05-06 10:23:17

当使用LSTM进行时间序列预测时,人们容易陷入一个常见的陷阱。为了解释这个问题,我们需要先回顾一下回归器和预测器是如何工作的。预测算法是这样处理时间序列的:

一个回归问题是这样的:

因为LSTM是一个回归量,我们需要把时间序列转换成一个回归问题。有许多方法可以做到这一点,一般使用窗口和多步的方法,但是在使用过程中会一个常见错误。

在窗口方法中,时间序列与每个时间步长的先前值相耦合,作为称为窗口的虚拟特征。这里我们有一个大小为3的窗口:

下面的函数从单个时间序列创建一个Window方法数据集。结果数据集将具有对角线重复,并且根据回看值,样本数量将发生变化:

def window(sequences, look_back):    X, y = [], []    for i in range(len(sequences)-look_back-1):        x = sequences[i:(i+look_back)]        X.append(x)        y.append(sequences[i + look_back])    return np.array(X), np.array(y)

让我们来检查一下结果。模型训练完成后,在测试集上进行测试。让我们看看代码和结果是什么样子的:

look_back = 3X, y = window(ts_data, look_back)# Train-test splittrain_ratio = 0.8train_size = int(train_ratio * len(ts_data))X_train, X_test = X[:train_size-look_back], X[train_size-look_back:]y_train, y_test = y[:train_size-look_back], y[train_size-look_back:]# Create and train LSTM modelmodel = Sequential()model.add(LSTM(units=72, activation='tanh', input_shape=(look_back, 1)))model.add(Dense(1))model.compile(loss='mean_squared_error', optimizer='Adam', metrics=['mape'])model.fit(x=X_train, y=y_train, epochs=500, batch_size=18, verbose=2)# Make predictionsforecasts = model.predict(X_test)lstm_fits = model.predict(X_train)# Calculate metricsmape = mean_absolute_percentage_error(y_test, forecasts)r2 = r2_score(y_train, lstm_fits)# Initialize datesdate_range = pd.date_range(start='1990-01-01', end='2023-09-30', freq='M')# Add empty values in fits to match the original time seriesfits = np.full(train_size, np.nan)for i in range(train_size-look_back):    fits[i+look_back] = lstm_fits[i]# Plot actual, fits, and forecastsplt.figure(figsize=(10, 6))plt.plot(date_range, ts_data, label='Actual', color='blue')plt.plot(date_range[:train_size], fits, label='Fitted', color='green')plt.plot(date_range[train_size:], forecasts, label='Forecast', color='red')plt.title('FSC - Short - Passengers\nOne Step Forward Forecast')plt.xlabel('Date')plt.ylabel('Passengers')plt.legend()plt.text(0.05, 0.05, f'R2 = {r2*100:.2f}%\nMAPE = {mape*100:.2f}%', transform=plt.gca().transAxes, fontsize=12)plt.grid(True)plt.show()

结果看起来很棒。但是看一下样本测试集,我们发现了一个奇怪的问题:

在生成y9时,y8在模型中被用作输入。但是实际上我们是不知道y8的值的,我们正在预测未来的时间步骤,将未来的值也纳入其中了。

所以用前一个实例的预测值替换输入值的迭代测试集将解决问题。但是在这种情况下,模型建立在自己的预测之上,就像传统的预测算法一样:

# Iterative prediction and substitutionfor i in range(len(X_test)):    forecasts[i] = model.predict(X_test[i].reshape(1, look_back, 1))    if i != len(X_test)-1:        X_test[i+1,look_back-1] = forecasts[i]        for j in range(look_back-1):            X_test[i+1,j] = X_test[i,j+1]

结果就变成了这样:

出现这种结果的一个主要原因是误差的放大,y8是预测的结果,本身就会产生误差,在误差的基础上预测y9就又会产生更大的误差,这样所得到的误差就会被一步一步的放大。

多步骤方法类似于窗口方法,但有更多的目标步骤。以下是两个步骤的示例:

对于这个方法,必须选择n_steps_in和n_steps_out。下面的代码将一个简单的时间序列转换成一个准备进行多步LSTM训练的数据集:

# split a univariate sequence into samples with multi-stepsdef split_sequences(sequences, n_steps_in, n_steps_out): X, y = list(), list() for i in range(len(sequences)):     # find the end of this pattern     end_ix = i + n_steps_in     out_end_ix = end_ix + n_steps_out     # check if we are beyond the sequence     if out_end_ix > len(sequences):         break     # gather input and output parts of the pattern     seq_x, seq_y = sequences[i:end_ix], sequences[end_ix:out_end_ix]     X.append(seq_x)     y.append(seq_y) return np.array(X), np.array(y)

不仅特征和目标都有对角线重复,这意味着要与时间序列进行比较,我们要么取平均值,要么选择一个预测。在下面的代码中,生成了第一、最后和平均预测的结果,需要注意的是,这里的第一次预测是提前一个月预测,最后一次预测是提前12个月预测。

n_steps_in = 12n_steps_out = 12X, y = split_sequences(ts_data, n_steps_in, n_steps_out)X = X.reshape(X.shape[0], X.shape[1], 1)y = y.reshape(y.shape[0], y.shape[1], 1)# Train-test splittrain_ratio = 0.8train_size = int(train_ratio * len(ts_data))X_train, X_test = X[:train_size-n_steps_in-n_steps_out+1], X[train_size-n_steps_in-n_steps_out+1:]y_train = y[:train_size-n_steps_in-n_steps_out+1]y_test = ts_data[train_size:]# Create and train LSTM modelmodel = Sequential()model.add(LSTM(units=72, activation='tanh', input_shape=(n_steps_in, 1)))model.add(Dense(units=n_steps_out))model.compile(loss='mean_squared_error', optimizer='Adam', metrics=['mape'])model.fit(x=X_train, y=y_train, epochs=500, batch_size=18, verbose=2)# Make predictionslstm_predictions = model.predict(X_test)lstm_fitted = model.predict(X_train)forecasts = [np.diag(np.fliplr(lstm_predictions), i).mean() for i in range(0, -lstm_predictions.shape[0], -1)]fits = [np.diag(np.fliplr(lstm_fitted), i).mean() for i in range(lstm_fitted.shape[1]+n_steps_in - 1, -lstm_fitted.shape[0], -1)]forecasts1 = lstm_predictions[n_steps_out-1:,0]fits1 = model.predict(X)[:train_size-n_steps_in,0]forecasts12 = lstm_predictions[:,n_steps_out-1]fits12 = lstm_fitted[:,n_steps_out-1]# Metricsav_mape = mean_absolute_percentage_error(y_test, forecasts)av_r2 = r2_score(ts_data[n_steps_in:train_size], fits[n_steps_in:])one_mape = mean_absolute_percentage_error(y_test[:-n_steps_out+1], forecasts1)one_r2 = r2_score(ts_data[n_steps_in:train_size], fits1)twelve_mape = mean_absolute_percentage_error(y_test, forecasts12)twelve_r2 = r2_score(ts_data[n_steps_in+n_steps_out-1:train_size], fits12) date_range = pd.date_range(start='1990-01-01', end='2023-09-30', freq='M')# Plot actual, fits, and forecastsplt.figure(figsize=(10, 6))plt.plot(date_range, ts_data, label='Actual', color='blue')plt.plot(date_range[:train_size], fits, label='Fitted', color='green')plt.plot(date_range[train_size:], forecasts, label='Forecast', color='red')plt.title('FSC - Short - Passengers\n. LSTM 12 Month Average Forecast')plt.xlabel('Date')plt.ylabel('Passengers')plt.legend()plt.text(0.05, 0.05, f'R2 = {av_r2*100:.2f}%\nMAPE = {av_mape*100:.2f}%', transform=plt.gca().transAxes, fontsize=12)plt.grid(True)plt.show()plt.figure(figsize=(10, 6))plt.plot(date_range, ts_data, label='Actual', color='blue')plt.plot(date_range[n_steps_in:train_size], fits1, label='Fitted', color='green')plt.plot(date_range[train_size:-n_steps_out+1], forecasts1, label='Forecast', color='red')plt.title('FSC - Short - Passengers\n LSTM 1 Month in advance Forecast')plt.xlabel('Date')plt.ylabel('Passengers')plt.legend()plt.text(0.05, 0.05, f'R2 = {one_r2*100:.2f}%\nMAPE = {one_mape*100:.2f}%', transform=plt.gca().transAxes, fontsize=12)plt.grid(True)plt.show()plt.figure(figsize=(10, 6))plt.plot(date_range, ts_data, label='Actual', color='blue')plt.plot(date_range[n_steps_in+n_steps_out-1:train_size], fits12, label='Fitted', color='green')plt.plot(date_range[train_size:], forecasts12, label='Forecast', color='red')plt.title('FSC - Short - Passengers\n LSTM 12 Months in advance Forecast')plt.xlabel('Date')plt.ylabel('Passengers')plt.legend()plt.text(0.05, 0.05, f'R2 = {twelve_r2*100:.2f}%\nMAPE = {twelve_mape*100:.2f}%', transform=plt.gca().transAxes, fontsize=12)plt.grid(True)plt.show()

同样的问题仍然存在这里:

那么上面的问题如何解决呢?

我们可以采用与在Window Method中所做的类似的方法。但是选择另一个方向,选择n_step_out与test_size相同。通过这种方式,测试集缩小到只有一个:

下面的函数就是这样做的。它需要时间序列、训练大小和样本数量。我把它称作可比性,因为这个版本实际上可以与其他预测算法进行比较:

def split_sequences_comparable(sequences, n_samples, train_size): # Steps n_steps_out = len(sequences) - train_size n_steps_in = train_size - n_steps_out - n_samples + 1 # End sets X_test = sequences[n_samples + n_steps_out - 1:train_size] X_forecast = sequences[-n_steps_in:] X, y = list(), list() for i in range(n_samples):     # find the end of this pattern     end_ix = i + n_steps_in     out_end_ix = end_ix + n_steps_out     # gather input and output parts of the pattern     seq_x, seq_y = sequences[i:end_ix], sequences[end_ix:out_end_ix]     X.append(seq_x)     y.append(seq_y) return np.array(X), np.array(y), np.array(X_test), np.array(X_forecast), n_steps_in, n_steps_out

上面的这个函数,n_steps_out是固定的,所以可以由参数来选择样本的数量和训练的大小,它会计算最大可能的n_steps_in。下面是执行的代码和结果:

n_samples = 12train_size = 321X, y, X_test, X_forecast, n_steps_in, n_steps_out = split_sequences_comparable(ts_data, n_samples, train_size)y_test = ts_data[train_size:]# ReshapingX = X.reshape(X.shape[0], X.shape[1], 1)X_test = X_test.reshape(X_test.shape[1], X_test.shape[0], 1)y = y.reshape(y.shape[0], y.shape[1])y_test = y_test.reshape(y_test.shape[1], y_test.shape[0], 1)# Create and train LSTM modelmodel = Sequential()model.add(LSTM(units=154, activation='tanh', input_shape=(n_steps_in, 1)))model.add(Dense(units=n_steps_out))model.compile(loss='mean_squared_error', optimizer='Adam', metrics=['mape'])model.fit(x=X, y=y, epochs=500, batch_size=18, verbose=2)# Make predictionslstm_predictions = model.predict(X_test)predictions = lstm_predictions.reshape(lstm_predictions.shape[1])lstm_fitted = model.predict(X)fits = [np.diag(np.fliplr(lstm_fitted), i).mean() for i in range(lstm_fitted.shape[1]+n_steps_in - 1, -lstm_fitted.shape[0], -1)]# Metricsmape = mean_absolute_percentage_error(y_test, predictions)r2 = r2_score(ts_data[n_steps_in:train_size], fits[n_steps_in:])# Plot actual, fits, and forecastsplt.figure(figsize=(10, 6))plt.plot(date_range, ts_data, label='Actual', color='blue')plt.plot(date_range[:train_size], fits, label='Fitted', color='green')plt.plot(date_range[train_size:], predictions, label='Forecast', color='red')plt.title('FSC - Short - Passengers\n12 Sample Comparable LSTM Forecast')plt.xlabel('Date')plt.ylabel('Passengers')plt.legend()plt.text(0.05, 0.05, f'R2 = {r2*100:.2f}%\nMAPE = {mape*100:.2f}%\', transform=plt.gca().transAxes, fontsize=12)plt.grid(True)plt.show()

结果虽然不是很满意,但是我们看到了代码已经预测了一些上升的趋势,要比前面的一条直线好一些,但是这里LSTM将所有时间步长聚合到特征中,所有这些方法都会丢失时间数据,所以在后面将介绍(编码器/解码器方法)来维护输入的时间结构,解决这一问题。

https://avoid.overfit.cn/post/77d4c12d7c8a480b95fcf9392b772946

作者:Seyed Mousavi

0 阅读:0

deephub

简介:提供专业的人工智能知识,包括CV NLP 数据挖掘等