Discover how advanced forecasting models help you predict trends, optimize planning, and make data-driven decisions with accuracy and confidence.
Our forecasting models
Our extensive library of models increases the chances of improving your accuracy
Univariate models (time series models)
ARIMA An autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model.
ETS Exponential smoothing is a rule of thumb technique for smoothing time series data using the exponential window function.
Theta The Theta model is a simple method for forecasting that involves fitting two theta-lines, forecasting the lines using simple exponential smoothing, and then combining the forecasts from the two lines to produce the final forecast.
STL STL is a versatile and robust method for decomposing time series. STL is an acronym for “Seasonal and Trend decomposition using Loess”, while Loess is a method for estimating nonlinear relationships.
TBATS TBATS is a time series model that is useful for handling data with multiple seasonal patterns. TBATS is an acronym for key features of the model: T: Trigonometric seasonality B: Box-Cox transformation A: ARIMA errors T: Trend S: Seasonal components.
ANN An artificial neural network (ANN) is a model which is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain.
Prophet Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
Multivariate, econometric models
VAR Vector Auto Regression is a model that captures the linear relations among multiple time series. VAR models generalize the univariate autoregressive model (AR model) by allowing for multiple variables. All variables in a VAR enter the model in the same way: each variable has an equation explaining its evolution based on its own lagged values, the lagged values of the other model variables, and an error term. The calculations find the best common lag length for all variables in all equations (vectors).
VECM Vector Error Correction Models are especially useful for data sets with long-run relationships (also called cointegration). VECMs are however useful for estimating both short-term and long-term effects of one-time series on another. The term error-correction relates to the fact that the last period’s deviation from a long-run equilibrium, the error, influences its short-run dynamics. These models estimate, besides the long-run relationships between variables, also directly the speed at which a dependent variable returns to equilibrium after a change in other variables.
VARMA In the statistical analysis of time series, Auto-Regressive, Moving-Average (ARMA) models provide a description of the relationships between the variables in terms of the two factors: autoregression (AR) and moving average (MA). The AR part involves regressing the variable on its own lagged (i.e. past) values. The MA part involves modeling the error term as a linear combination of error terms occurring contemporaneously and at various times in the past. VARMA is the VAR (multivariate) version of the ARMA model.
ARDL Auto-Regressive Distributed Lag was the standard model before the VAR model was invented. Compared to the VAR, it’s a less complex model, where the variables are not seen as interrelated. The main variable that are forecasted depends on the indicators, but the indicators do not depend on other indicators or the main variable.
Multivariate, penalized models
Ridge Regression This is a way of using Bayesian models in a VAR framework. Prior to Lasso, the most widely used method for choosing which variables to include was stepwise selection. At that time, ridge regression was the most popular alternative technique used for improving prediction accuracy. Ridge regression improves prediction error by shrinking large regression coefficients in order to reduce overfitting, but it does not perform variable selection and therefore does not help to make the model more interpretable.
Lasso Lasso, Least Absolute Shrinkage and Selection Operator is the most successful application of AI within econometrics. Lasso was introduced in order to improve the prediction accuracy and interpretability of regression models by altering the model fitting process to select only a subset of the provided independent variables for use in the final model rather than using all of them. Lasso forces certain coefficients to be set to zero, effectively choosing a simpler model that does not include those coefficients.
Elastic Net The elastic net is a regression method that linearly combines the lasso and ridge (see below) methods. Basically, the elastic net method finds the ridge regression coefficients, and then does a lasso type shrinkage of the coefficients.
VECM Lasso Vector Error Correction Models are useful for data sets with long-run relationships (also called cointegration). VECMs are useful for estimating both short-term and long-term effects of one-time series on another. The term error-correction relates to the fact that last period’s deviation from a long-run equilibrium, the error, influences its short-run dynamics. These models estimate, besides the long-run relationships between variables, also directly the speed at which a dependent variable returns to equilibrium after a change in other variables. This version is combined with Lasso, Least Absolute Shrinkage, and Selection Operator which forces certain coefficients to be set to zero, effectively choosing a simpler model that does not include those coefficients.
Group Lasso In 2006, Yuan and Lin introduced the group lasso in order to allow predefined groups of covariates to be selected into or out of a model together, so that all the members of a particular group are either included or not included.
Lag Group Lasso Groups the series based on the lags of the explanatory variables. The model selects the variables and their lags based on lag grouping, meaning that the 1st lags, 2nd lags etc. of all variables are put into groups. If not contributing, entire groups will then be penalized.
Lag weighted Lasso Consists of a Lasso penalty that increases geometrically with lag. This means that shorter lags are prioritized in these models, compared to the set up in other VAR models.
Endogenous-First VARX Endegenous-First utilizes a penalty to prioritize endogenous series. At a given lag, an exogenous series can enter the model only if their endogenous counterpart is nonzero. Click here for more information.
Own/Other Group Penalty In this model the grouping distinguishes between a series’ own lags and those of other series. This structure is similar to Componentwise (see below) but prioritizes “own” lags over “other” lags for a specific lag. This is based on the hypothesis that own lags are more informative than other lags.
Own/Other Sparse Group Penalty Sparse refers to not penalizing a whole group. In certain scenarios, a group penalty can be too restrictive. On the other hand, having many groups will substantially increase computation time and generally does not improve forecasting performance.
Hierarchical vector autoregression (HVAR) Hierarchical Vector Auto Regression, HVAR models, alleviate the problem of forecast performance starting to degrade as each added variable is treated democratically despite more distant data generally tending to be less useful in forecasting. Instead of imposing a single, universal lag order, lags can vary across in HVAR models. There are no exogenous variables in the HVAR framework.
Componentwise Lasso In Componentwise models all variables have the same maximum lag.
Own/Other Lasso Imposes an additional layer of hierarchy: prioritizing “own” lags over “other” lags in the HVAR framework.
Elementwise Lasso The most general structure, in each marginal model, each series may have its own maximum lag.
Mixed-frequency models (multivariate)
MIDAS Mixed Data Sampling (MIDAS) models use high frequency indicators to predict a low frequency variable. By fitting a lag distribution function the number of parameters is kept low, reducing the risk of over-fitting.
Unrestricted MIDAS Unrestricted Mixed Data Sampling (MIDAS) models use high frequency indicators to predict a low frequency variable.
MIDAS Sparse Group Penalty Mixed Data Sampling (MIDAS) models use high frequency indicators to predict a low frequency variable. By applying a sparse group penalty function the parameters are shrunk towards zero, reducing the risk of over-fitting.
MIDAS Lasso Mixed Data Sampling (MIDAS) models use high frequency indicators to predict a low frequency variable. By applying a lasso penalty function the parameters are shrunk towards zero, reducing the risk of over-fitting.
Machine learning models (multivariate)
ANN The artificial neural network (ANN) is a model inspired by biological neural networks such as the human brain. The model is an example of a more sparse machine learning model compared to LSTM and GRU. This lessens the risk of overfitting while still offering more flexibility than a linear model. ANN is trained on data using variants of gradient descent, such as AdaGrad and ADAM.
LSTM The long short-term memory (LSTM) model is an artificial recurrent neural network. It is especially suited for processing sequences of data, owing to its feedback connections. LSTM models are used for many different tasks such as speech and video analysis, as well as time series analysis. One of the main strengths of an LSTM model is its flexibility, it can identify complex structures in data thanks to its non-linear activation functions and heavy parametrization. LSTM is trained on data using variants of gradient descent, such as AdaGrad and ADAM.
GRU The gated recurrent unit (GRU) model is a type of recurrent neural network. As such it is well suited for sequential data such as time series. The main strength is the high flexibility compared to linear models, a GRU model can identify non-linear patterns in data, allowing it to more accurately describe it. It is similar to LSTM but has fewer parameters, which lessens the risk of overfitting on smaller sets of data. GRU is trained on data using variants of gradient descent, such as AdaGrad and ADAM.