LSTM时间序列预测中的一个常见错误以及如何修正

当使用LSTM进行时间序列预测时，人们容易陷入一个常见的陷阱。为了解释这个问题，我们需要先回顾一下回归器和预测器是如何工作的。预测算法是这样处理时间序列的:

一个回归问题是这样的:

因为LSTM是一个回归量，我们需要把时间序列转换成一个回归问题。有许多方法可以做到这一点，一般使用窗口和多步的方法，但是在使用过程中会一个常见错误。

在窗口方法中，时间序列与每个时间步长的先前值相耦合，作为称为窗口的虚拟特征。这里我们有一个大小为3的窗口:

下面的函数从单个时间序列创建一个Window方法数据集。结果数据集将具有对角线重复，并且根据回看值，样本数量将发生变化:

def window(sequences, look_back): X, y = [], [] for i in range(len(sequences)-look_back-1): x = sequences[i:(i+look_back)] X.append(x) y.append(sequences[i + look_back]) return np.array(X), np.array(y)

让我们来检查一下结果。模型训练完成后，在测试集上进行测试。让我们看看代码和结果是什么样子的:

look_back = 3X, y = window(ts_data, look_back)# Train-test splittrain_ratio = 0.8train_size = int(train_ratio * len(ts_data))X_train, X_test = X[:train_size-look_back], X[train_size-look_back:]y_train, y_test = y[:train_size-look_back], y[train_size-look_back:]# Create and train LSTM modelmodel = Sequential()model.add(LSTM(units=72, activation='tanh', input_shape=(look_back, 1)))model.add(Dense(1))model.compile(loss='mean_squared_error', optimizer='Adam', metrics=['mape'])model.fit(x=X_train, y=y_train, epochs=500, batch_size=18, verbose=2)# Make predictionsforecasts = model.predict(X_test)lstm_fits = model.predict(X_train)# Calculate metricsmape = mean_absolute_percentage_error(y_test, forecasts)r2 = r2_score(y_train, lstm_fits)# Initialize datesdate_range = pd.date_range(start='1990-01-01', end='2023-09-30', freq='M')# Add empty values in fits to match the original time seriesfits = np.full(train_size, np.nan)for i in range(train_size-look_back): fits[i+look_back] = lstm_fits[i]# Plot actual, fits, and forecastsplt.figure(figsize=(10, 6))plt.plot(date_range, ts_data, label='Actual', color='blue')plt.plot(date_range[:train_size], fits, label='Fitted', color='green')plt.plot(date_range[train_size:], forecasts, label='Forecast', color='red')plt.title('FSC - Short - Passengers\nOne Step Forward Forecast')plt.xlabel('Date')plt.ylabel('Passengers')plt.legend()plt.text(0.05, 0.05, f'R2 = {r2*100:.2f}%\nMAPE = {mape*100:.2f}%', transform=plt.gca().transAxes, fontsize=12)plt.grid(True)plt.show()

结果看起来很棒。但是看一下样本测试集，我们发现了一个奇怪的问题:

在生成y9时，y8在模型中被用作输入。但是实际上我们是不知道y8的值的，我们正在预测未来的时间步骤，将未来的值也纳入其中了。

所以用前一个实例的预测值替换输入值的迭代测试集将解决问题。但是在这种情况下，模型建立在自己的预测之上，就像传统的预测算法一样:

# Iterative prediction and substitutionfor i in range(len(X_test)): forecasts[i] = model.predict(X_test[i].reshape(1, look_back, 1)) if i != len(X_test)-1: X_test[i+1,look_back-1] = forecasts[i] for j in range(look_back-1): X_test[i+1,j] = X_test[i,j+1]

结果就变成了这样：

出现这种结果的一个主要原因是误差的放大，y8是预测的结果，本身就会产生误差，在误差的基础上预测y9就又会产生更大的误差，这样所得到的误差就会被一步一步的放大。

多步骤方法类似于窗口方法，但有更多的目标步骤。以下是两个步骤的示例:

对于这个方法，必须选择n_steps_in和n_steps_out。下面的代码将一个简单的时间序列转换成一个准备进行多步LSTM训练的数据集:

# split a univariate sequence into samples with multi-stepsdef split_sequences(sequences, n_steps_in, n_steps_out): X, y = list(), list() for i in range(len(sequences)): # find the end of this pattern end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out # check if we are beyond the sequence if out_end_ix > len(sequences): break # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix], sequences[end_ix:out_end_ix] X.append(seq_x) y.append(seq_y) return np.array(X), np.array(y)

不仅特征和目标都有对角线重复，这意味着要与时间序列进行比较，我们要么取平均值，要么选择一个预测。在下面的代码中，生成了第一、最后和平均预测的结果，需要注意的是，这里的第一次预测是提前一个月预测，最后一次预测是提前12个月预测。

n_steps_in = 12n_steps_out = 12X, y = split_sequences(ts_data, n_steps_in, n_steps_out)X = X.reshape(X.shape[0], X.shape[1], 1)y = y.reshape(y.shape[0], y.shape[1], 1)# Train-test splittrain_ratio = 0.8train_size = int(train_ratio * len(ts_data))X_train, X_test = X[:train_size-n_steps_in-n_steps_out+1], X[train_size-n_steps_in-n_steps_out+1:]y_train = y[:train_size-n_steps_in-n_steps_out+1]y_test = ts_data[train_size:]# Create and train LSTM modelmodel = Sequential()model.add(LSTM(units=72, activation='tanh', input_shape=(n_steps_in, 1)))model.add(Dense(units=n_steps_out))model.compile(loss='mean_squared_error', optimizer='Adam', metrics=['mape'])model.fit(x=X_train, y=y_train, epochs=500, batch_size=18, verbose=2)# Make predictionslstm_predictions = model.predict(X_test)lstm_fitted = model.predict(X_train)forecasts = [np.diag(np.fliplr(lstm_predictions), i).mean() for i in range(0, -lstm_predictions.shape[0], -1)]fits = [np.diag(np.fliplr(lstm_fitted), i).mean() for i in range(lstm_fitted.shape[1]+n_steps_in - 1, -lstm_fitted.shape[0], -1)]forecasts1 = lstm_predictions[n_steps_out-1:,0]fits1 = model.predict(X)[:train_size-n_steps_in,0]forecasts12 = lstm_predictions[:,n_steps_out-1]fits12 = lstm_fitted[:,n_steps_out-1]# Metricsav_mape = mean_absolute_percentage_error(y_test, forecasts)av_r2 = r2_score(ts_data[n_steps_in:train_size], fits[n_steps_in:])one_mape = mean_absolute_percentage_error(y_test[:-n_steps_out+1], forecasts1)one_r2 = r2_score(ts_data[n_steps_in:train_size], fits1)twelve_mape = mean_absolute_percentage_error(y_test, forecasts12)twelve_r2 = r2_score(ts_data[n_steps_in+n_steps_out-1:train_size], fits12) date_range = pd.date_range(start='1990-01-01', end='2023-09-30', freq='M')# Plot actual, fits, and forecastsplt.figure(figsize=(10, 6))plt.plot(date_range, ts_data, label='Actual', color='blue')plt.plot(date_range[:train_size], fits, label='Fitted', color='green')plt.plot(date_range[train_size:], forecasts, label='Forecast', color='red')plt.title('FSC - Short - Passengers\n. LSTM 12 Month Average Forecast')plt.xlabel('Date')plt.ylabel('Passengers')plt.legend()plt.text(0.05, 0.05, f'R2 = {av_r2*100:.2f}%\nMAPE = {av_mape*100:.2f}%', transform=plt.gca().transAxes, fontsize=12)plt.grid(True)plt.show()plt.figure(figsize=(10, 6))plt.plot(date_range, ts_data, label='Actual', color='blue')plt.plot(date_range[n_steps_in:train_size], fits1, label='Fitted', color='green')plt.plot(date_range[train_size:-n_steps_out+1], forecasts1, label='Forecast', color='red')plt.title('FSC - Short - Passengers\n LSTM 1 Month in advance Forecast')plt.xlabel('Date')plt.ylabel('Passengers')plt.legend()plt.text(0.05, 0.05, f'R2 = {one_r2*100:.2f}%\nMAPE = {one_mape*100:.2f}%', transform=plt.gca().transAxes, fontsize=12)plt.grid(True)plt.show()plt.figure(figsize=(10, 6))plt.plot(date_range, ts_data, label='Actual', color='blue')plt.plot(date_range[n_steps_in+n_steps_out-1:train_size], fits12, label='Fitted', color='green')plt.plot(date_range[train_size:], forecasts12, label='Forecast', color='red')plt.title('FSC - Short - Passengers\n LSTM 12 Months in advance Forecast')plt.xlabel('Date')plt.ylabel('Passengers')plt.legend()plt.text(0.05, 0.05, f'R2 = {twelve_r2*100:.2f}%\nMAPE = {twelve_mape*100:.2f}%', transform=plt.gca().transAxes, fontsize=12)plt.grid(True)plt.show()

同样的问题仍然存在这里:

那么上面的问题如何解决呢？

我们可以采用与在Window Method中所做的类似的方法。但是选择另一个方向，选择n_step_out与test_size相同。通过这种方式，测试集缩小到只有一个:

下面的函数就是这样做的。它需要时间序列、训练大小和样本数量。我把它称作可比性，因为这个版本实际上可以与其他预测算法进行比较:

def split_sequences_comparable(sequences, n_samples, train_size): # Steps n_steps_out = len(sequences) - train_size n_steps_in = train_size - n_steps_out - n_samples + 1 # End sets X_test = sequences[n_samples + n_steps_out - 1:train_size] X_forecast = sequences[-n_steps_in:] X, y = list(), list() for i in range(n_samples): # find the end of this pattern end_ix = i + n_steps_in out_end_ix = end_ix + n_steps_out # gather input and output parts of the pattern seq_x, seq_y = sequences[i:end_ix], sequences[end_ix:out_end_ix] X.append(seq_x) y.append(seq_y) return np.array(X), np.array(y), np.array(X_test), np.array(X_forecast), n_steps_in, n_steps_out

上面的这个函数，n_steps_out是固定的，所以可以由参数来选择样本的数量和训练的大小，它会计算最大可能的n_steps_in。下面是执行的代码和结果:

n_samples = 12train_size = 321X, y, X_test, X_forecast, n_steps_in, n_steps_out = split_sequences_comparable(ts_data, n_samples, train_size)y_test = ts_data[train_size:]# ReshapingX = X.reshape(X.shape[0], X.shape[1], 1)X_test = X_test.reshape(X_test.shape[1], X_test.shape[0], 1)y = y.reshape(y.shape[0], y.shape[1])y_test = y_test.reshape(y_test.shape[1], y_test.shape[0], 1)# Create and train LSTM modelmodel = Sequential()model.add(LSTM(units=154, activation='tanh', input_shape=(n_steps_in, 1)))model.add(Dense(units=n_steps_out))model.compile(loss='mean_squared_error', optimizer='Adam', metrics=['mape'])model.fit(x=X, y=y, epochs=500, batch_size=18, verbose=2)# Make predictionslstm_predictions = model.predict(X_test)predictions = lstm_predictions.reshape(lstm_predictions.shape[1])lstm_fitted = model.predict(X)fits = [np.diag(np.fliplr(lstm_fitted), i).mean() for i in range(lstm_fitted.shape[1]+n_steps_in - 1, -lstm_fitted.shape[0], -1)]# Metricsmape = mean_absolute_percentage_error(y_test, predictions)r2 = r2_score(ts_data[n_steps_in:train_size], fits[n_steps_in:])# Plot actual, fits, and forecastsplt.figure(figsize=(10, 6))plt.plot(date_range, ts_data, label='Actual', color='blue')plt.plot(date_range[:train_size], fits, label='Fitted', color='green')plt.plot(date_range[train_size:], predictions, label='Forecast', color='red')plt.title('FSC - Short - Passengers\n12 Sample Comparable LSTM Forecast')plt.xlabel('Date')plt.ylabel('Passengers')plt.legend()plt.text(0.05, 0.05, f'R2 = {r2*100:.2f}%\nMAPE = {mape*100:.2f}%\', transform=plt.gca().transAxes, fontsize=12)plt.grid(True)plt.show()

结果虽然不是很满意，但是我们看到了代码已经预测了一些上升的趋势，要比前面的一条直线好一些，但是这里LSTM将所有时间步长聚合到特征中，所有这些方法都会丢失时间数据，所以在后面将介绍(编码器/解码器方法)来维护输入的时间结构，解决这一问题。

https://avoid.overfit.cn/post/77d4c12d7c8a480b95fcf9392b772946

作者：Seyed Mousavi

玩酷网

LSTM时间序列预测中的一个常见错误以及如何修正

deephub