The multivariate Long Short-Term Memory (LSTM) model is a Machine Learning model which is a type of recurrent neural network. Indicio offers a selection of machine learning model, with the most basic being the univariate neural model (see Advanced: Neural). The multivariate generalization of that model is the Artifical Neural Network (ANN) model (Advanced: ANN). The LSTM model, being a recurrent neural network, is more temporally aware as it allows information to flow between nodes of the same layer, in the direction of higher to lower lags.
To model a set of kk time series Y1,...,YkY1,...,Yk using a neural network, the p⋅kp⋅k lagged values are used as inputs and the neural network is trained to explain the current pp values of the included time series. Just like in the univariate case, a forecast can then be created by using yt,...,yt−p+1yt,...,yt−p+1 as inputs to predict yt+1yt+1. Note that we are now writing ytyt to denote the vector of kk values at time tt, meaning that the model will create forecasts for all included variables. This can then be repeated in a recursive manner using the just forecast values as input, creating a forecast of the desired length. The LSTM model differs from a general neural network in that it is recurrent, this has the implication that it is better tailored towards handling sequential data such as time series. The other recurrent neural network available in Indicio is the Gated Recurrent Unit (GRU) model (Advanced: GRU) which is a less heavily parametrized version of LSTM. GRU was developed as a simplification of the LSTM model and has been shown to have similar performance.
As the number of inputs and outputs of a model increases, so does the required size of the hidden layers, and with them the complexity of the model. This poses a challenge as a complex model always runs the risk of being overfitted to the data. To remedy this, the data is split into a train set and a validation set.
The model is trained on the training data using Stochastic Gradient Descent (SGD). Only a few of the observations are used at each iteration, meaning that after a set number of iterations, the SGD algorithm will have gone through all the data. Each such set of iterations is referred to as an epoch. After each epoch, the model is used to create a forecast into the validation set, and the out of sample forecast error is calculated. The model is also as part of the training process producing in-sample forecasts which are referred to as fitted values which the in-sample forecast error can be calculated from.
This will create two series of forecast errors, in-sample and out of sample per epoch. Indicio applies something called early stopping which means that when out of sample accuracy starts to get worse over multiple epochs, the training process is halted and the model is considered to be finished.