Forecasting models

Our forecasting models

Our extensive library of models increases the chances of improving your accuracy

Univariate, time series models

Univariate forecasting models, also referred as time-series models predict future values of a single time series using only its past observations, capturing patterns like trend, seasonality, and autocorrelation.

Multivariate, classic econometric models

Econometric forecasting models use statistical theory and economic relationships to explain and predict future values of economic variables.

BVAR Minnesota Prior

BVAR Steady-State prior

BVAR Time-varying

VAR

VECM

Multivariate, machine learning models

Machine learning forecasting models use algorithms like trees and neural networks to learn complex patterns from data.

Multivariate, penalized models

Penalized forecasting models add a penalty to large or complex parameters to reduce overfitting, improve generalization, and handle many predictors.

Multivariate, group Lasso

In 2006, Yuan and Lin introduced the group lasso in order to allow predefined groups of covariates to be selected into or out of a model together, so that all the members of a particular group are either included or not included.

VARX Lag Group Lasso

VARX Own/Other Sparse Group Penalty

VARX Own/Other Group Penalty

VARX Endogenous-First

VAR Lag weighted Lasso

Multivariate, hierarchical vector autoregression

Hierarchical Vector Auto Regression, HVAR models, alleviate the problem of forecast performance starting to degrade as each added variable is treated democratically despite more distant data generally tending to be less useful in forecasting. Instead of imposing a single, universal lag order, lags can vary across in HVAR models. There are no exogenous variables in the HVAR framework.

HVAR Own/Other Lasso

HVAR Elementwise Lasso

HVAR Componentwise Lasso

Multivariate, mixed frequency models

Mixed frequency forecasting models use higher frequency data to predict outcomes at a lower frequency and are commonly applied in nowcasting.

MIDAS

MIDAS Lasso

MIDAS Sparse Group Penalty

Unrestricted MIDAS

Our forecasting models

Our extensive library of models increases the chances of improving your accuracy

Univariate models (time series models)

ARIMA
An autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model.
ETS
Exponential smoothing is a rule of thumb technique for smoothing time series data using the exponential window function.
Theta
The Theta model is a simple method for forecasting that involves fitting two theta-lines, forecasting the lines using simple exponential smoothing, and then combining the forecasts from the two lines to produce the final forecast.
STL
STL is a versatile and robust method for decomposing time series. STL is an acronym for “Seasonal and Trend decomposition using Loess”, while Loess is a method for estimating nonlinear relationships.
TBATS
TBATS is a time series model that is useful for handling data with multiple seasonal patterns. TBATS is an acronym for key features of the model: T: Trigonometric seasonality B: Box-Cox transformation A: ARIMA errors T: Trend S: Seasonal components.
ANN
An artificial neural network (ANN) is a model which is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain.
Prophet
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

Multivariate, econometric models

VAR
Vector Auto Regression is a model that captures the linear relations among multiple time series. VAR models generalize the univariate autoregressive model (AR model) by allowing for multiple variables. All variables in a VAR enter the model in the same way: each variable has an equation explaining its evolution based on its own lagged values, the lagged values of the other model variables, and an error term. The calculations find the best common lag length for all variables in all equations (vectors).
VECM
Vector Error Correction Models are especially useful for data sets with long-run relationships (also called cointegration). VECMs are however useful for estimating both short-term and long-term effects of one-time series on another. The term error-correction relates to the fact that the last period’s deviation from a long-run equilibrium, the error, influences its short-run dynamics. These models estimate, besides the long-run relationships between variables, also directly the speed at which a dependent variable returns to equilibrium after a change in other variables.
VARMA
In the statistical analysis of time series, Auto-Regressive, Moving-Average (ARMA) models provide a description of the relationships between the variables in terms of the two factors: autoregression (AR) and moving average (MA). The AR part involves regressing the variable on its own lagged (i.e. past) values. The MA part involves modeling the error term as a linear combination of error terms occurring contemporaneously and at various times in the past. VARMA is the VAR (multivariate) version of the ARMA model.
ARDL
Auto-Regressive Distributed Lag was the standard model before the VAR model was invented. Compared to the VAR, it’s a less complex model, where the variables are not seen as interrelated. The main variable that are forecasted depends on the indicators, but the indicators do not depend on other indicators or the main variable.

Multivariate, penalized models

Ridge Regression
This is a way of using Bayesian models in a VAR framework. Prior to Lasso, the most widely used method for choosing which variables to include was stepwise selection. At that time, ridge regression was the most popular alternative technique used for improving prediction accuracy. Ridge regression improves prediction error by shrinking large regression coefficients in order to reduce overfitting, but it does not perform variable selection and therefore does not help to make the model more interpretable.
Lasso
Lasso, Least Absolute Shrinkage and Selection Operator is the most successful application of AI within econometrics. Lasso was introduced in order to improve the prediction accuracy and interpretability of regression models by altering the model fitting process to select only a subset of the provided independent variables for use in the final model rather than using all of them. Lasso forces certain coefficients to be set to zero, effectively choosing a simpler model that does not include those coefficients.
Elastic Net
The elastic net is a regression method that linearly combines the lasso and ridge (see below) methods. Basically, the elastic net method finds the ridge regression coefficients, and then does a lasso type shrinkage of the coefficients.
VECM Lasso
Vector Error Correction Models are useful for data sets with long-run relationships (also called cointegration). VECMs are useful for estimating both short-term and long-term effects of one-time series on another. The term error-correction relates to the fact that last period’s deviation from a long-run equilibrium, the error, influences its short-run dynamics. These models estimate, besides the long-run relationships between variables, also directly the speed at which a dependent variable returns to equilibrium after a change in other variables. This version is combined with Lasso, Least Absolute Shrinkage, and Selection Operator which forces certain coefficients to be set to zero, effectively choosing a simpler model that does not include those coefficients.
Group Lasso
In 2006, Yuan and Lin introduced the group lasso in order to allow predefined groups of covariates to be selected into or out of a model together, so that all the members of a particular group are either included or not included.
- Lag Group Lasso
  Groups the series based on the lags of the explanatory variables. The model selects the variables and their lags based on lag grouping, meaning that the 1st lags, 2nd lags etc. of all variables are put into groups. If not contributing, entire groups will then be penalized.
- Lag weighted Lasso
  Consists of a Lasso penalty that increases geometrically with lag. This means that shorter lags are prioritized in these models, compared to the set up in other VAR models.
- Endogenous-First
  VARX Endegenous-First utilizes a penalty to prioritize endogenous series. At a given lag, an exogenous series can enter the model only if their endogenous counterpart is nonzero. Click here for more information.
- Own/Other Group Penalty
  In this model the grouping distinguishes between a series’ own lags and those of other series. This structure is similar to Componentwise (see below) but prioritizes “own” lags over “other” lags for a specific lag. This is based on the hypothesis that own lags are more informative than other lags.
- Own/Other Sparse Group Penalty
  Sparse refers to not penalizing a whole group. In certain scenarios, a group penalty can be too restrictive. On the other hand, having many groups will substantially increase computation time and generally does not improve forecasting performance.
Hierarchical vector autoregression (HVAR)
Hierarchical Vector Auto Regression, HVAR models, alleviate the problem of forecast performance starting to degrade as each added variable is treated democratically despite more distant data generally tending to be less useful in forecasting. Instead of imposing a single, universal lag order, lags can vary across in HVAR models. There are no exogenous variables in the HVAR framework.
- Componentwise Lasso
  In Componentwise models all variables have the same maximum lag.
- Own/Other Lasso
  Imposes an additional layer of hierarchy: prioritizing “own” lags over “other” lags in the HVAR framework.
- Elementwise Lasso
  The most general structure, in each marginal model, each series may have its own maximum lag.

Mixed-frequency models (multivariate)

MIDAS
Mixed Data Sampling (MIDAS) models use high frequency indicators to predict a low frequency variable. By fitting a lag distribution function the number of parameters is kept low, reducing the risk of over-fitting.
Unrestricted MIDAS
Unrestricted Mixed Data Sampling (MIDAS) models use high frequency indicators to predict a low frequency variable.
MIDAS Sparse Group Penalty
Mixed Data Sampling (MIDAS) models use high frequency indicators to predict a low frequency variable. By applying a sparse group penalty function the parameters are shrunk towards zero, reducing the risk of over-fitting.
MIDAS Lasso
Mixed Data Sampling (MIDAS) models use high frequency indicators to predict a low frequency variable. By applying a lasso penalty function the parameters are shrunk towards zero, reducing the risk of over-fitting.

Machine learning models (multivariate)

ANN
The artificial neural network (ANN) is a model inspired by biological neural networks such as the human brain. The model is an example of a more sparse machine learning model compared to LSTM and GRU. This lessens the risk of overfitting while still offering more flexibility than a linear model. ANN is trained on data using variants of gradient descent, such as AdaGrad and ADAM.
LSTM
The long short-term memory (LSTM) model is an artificial recurrent neural network. It is especially suited for processing sequences of data, owing to its feedback connections. LSTM models are used for many different tasks such as speech and video analysis, as well as time series analysis. One of the main strengths of an LSTM model is its flexibility, it can identify complex structures in data thanks to its non-linear activation functions and heavy parametrization. LSTM is trained on data using variants of gradient descent, such as AdaGrad and ADAM.
GRU
The gated recurrent unit (GRU) model is a type of recurrent neural network. As such it is well suited for sequential data such as time series. The main strength is the high flexibility compared to linear models, a GRU model can identify non-linear patterns in data, allowing it to more accurately describe it. It is similar to LSTM but has fewer parameters, which lessens the risk of overfitting on smaller sets of data. GRU is trained on data using variants of gradient descent, such as AdaGrad and ADAM.