The multivariate Gated Recurrent Unit (GRU) model is a Machine Learning model which is a type of recurrent neural network. Indicio offers a selection of machine learning model, with the most basic being the univariate neural model. The multivariate generalization of that model is the Artifical Neural Network (ANN) model. The GRU model, being a recurrent neural network, is more temporally aware as it allows information to flow between nodes of the same layer, in the direction of higher to lower lags.
To model a set of kk time series Y1,...,Yk using a neural network, the p⋅k lagged values are used as inputs and the neural network is trained to explain the current p values of the included time series. Just like in the univariate case, a forecast can then be created by using yt,...,yt−p+1 as inputs to predict yt+1. Note that we are now writing yt to denote the vector of k values at time t, meaning that the model will create forecasts for all included variables. This can then be repeated in a recursive manner using the just forecast values as input, creating a forecast of the desired length. The GRU model differs from a general neural network in that it is recurrent, this has the implication that it is better tailored towards handling sequential data such as time series. The other recurrent neural network available in Indicio is the Long Short-Term Memory (LSTM) model which is more complex but at the same time more prone to over-fitting. GRU was developed as a simplification of the LSTM model and has been shown to have similar performance.
As the number of inputs and outputs of a model increases, so does the required size of the hidden layers, and with them the complexity of the model. This poses a challenge as a complex model always runs the risk of being overfitted to the data. To remedy this, the data is split into a train set and a validation set.
The model is trained on the training data using Stochastic Gradient Descent (SGD). Only a few of the observations are used at each iteration, meaning that after a set number of iterations, the SGD algorithm will have gone through all the data. Each such set of iterations is referred to as an epoch. After each epoch, the model is used to create a forecast into the validation set, and the out of sample forecast error is calculated. The model is also as part of the training process producing in-sample forecasts which are referred to as fitted values which the in-sample forecast error can be calculated from.
This will create two series of forecast errors, in-sample and out of sample per epoch. Indicio applies something called early stopping which means that when out of sample accuracy starts to get worse over multiple epochs, the training process is halted and the model is considered to be finished.