How to use Keras TimeseriesGenerator for time series data

(Comments)

series

You might have dealt with a predictive model whose task is to predict a future value based on historical data. It is tedious to prepare the input and output pairs given the time series data. I recently come across the Keras built-in utility TimeseriesGenerator which precisely does what I want.

In the following demo, you will learn how to apply it to your dataset.

The source code is available on my GitHub repository.

Imagining you are a fund manager with an acute data science awareness who wants to predict today's Dow Jones Index given publicly available stock prices.

dji

Instead of using the absolute DJI index value which has increased by 60% during past few years, we will use the day change value as the time-series data instead. As dataset_DJI representing the absolute DJI index, the day change values can be computed by

dataset = dataset_DJI[1:]- dataset_DJI[:-1]

We can further normalize all values and split them into the train/test datasets.

from sklearn.preprocessing import MinMaxScaler
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
# split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]

Single time-series prediction

You are aware of the RNN, or more precisely LSTM network captures time-series patterns, we can build such a model with the input being the past three days' change values, and the output being the current day's change value. The number three is the look back length which can be tuned for different datasets and tasks. Put it simply, Day T's value is predicted by day T-3, T-2, and T-1's. But how can we construct the training and testing input/output pairs for the model?  Keras' TimeseriesGenerator makes our life easier by eliminating the boilerplate code we used to use to complete this step.

Let's build two time-series generators one for training and one for testing. We use a sampling rate as one as we don't want to skip any samples in the datasets.

from keras.preprocessing.sequence import TimeseriesGenerator

train_data_gen = TimeseriesGenerator(train, train,
	length=look_back, sampling_rate=1,stride=1,
    batch_size=3)
test_data_gen = TimeseriesGenerator(test, test,
	length=look_back, sampling_rate=1,stride=1,
	batch_size=1)

After a simple Keras model comes into place, we can fire up the training process.

model = Sequential()
model.add(LSTM(4, input_shape=(look_back, 1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
history = model.fit_generator(train_data_gen, epochs=100).history

After training, evaluation and the final prediction can also be done like this.

model.evaluate_generator(test_data_gen)
trainPredict = model.predict_generator(train_data_gen)
testPredict = model.predict_generator(test_data_gen)

From there, we can reconstruct the predicted DJI absolute values from the predicted day change values by first reverse the min-max normalization process and adds the predicted day change value to the previous day's absolute value.

compare1

Multiple time-series as input

You might have noticed all previous TimeseriesGenerator's "data", and "targets" arguments are the same which means inputs and outputs all came from the same time-series. What if in real life today's DJI close price might be affected by the previous stock prices of some big firms like Apple and Amazon? We want to incorporate those stocks day change values into the model input as well. To accomplish this, you can first concatenate all three time-series to make a (T, 3) shaped numpy array then pass the preprocessed result to the "data" argument of TimeseriesGenerator.

dataset_x = np.concatenate((dataset_delta_DJI, 
	dataset_delta_APPL, 
	dataset_delta_AMAZN), axis = 1)
dataset_y = dataset_delta_DJI
# Dataset normalization and train test split similar to previous example

train_data_gen = TimeseriesGenerator(train_x, train_y,
	length=look_back, sampling_rate=1,stride=1,
	batch_size=3)
test_data_gen = TimeseriesGenerator(test_x, test_y,
	length=look_back, sampling_rate=1,stride=1,
	batch_size=1)

Finally, be sure to change the model input shape to match the input shape of (None, look_back, 3).

Conclusion

This quick tutorial shows you how to use Keras' TimeseriesGenerator to alleviate work when dealing with time series prediction tasks. It allows you to apply the same or different time-series as input and output to train a model. The source code is available on my GitHub repository.

Current rating: 3.4

Comments