Variable Selection

In the dynamic world of forecasting, choosing the right variables, also called leading indicators or features, can make all the difference between accurate and unreliable forecasts. Our advanced variable selection tool empowers you to identify the most relevant variables, optimizing your forecasting models for accuracy, efficiency, and reliability.

Correlation vs. advanced methods

Many organizations rely on correlation to identify leading indicators, but this approach often falls short in producing accurate forecasts.

Correlation only measures the linear relationship between two variables, whereas advanced methods can assess interactions among multiple variables, quantify the contribution of each, and account for group effects. This leads to significantly improved forecast accuracy.

For more insights, check out our interview with Professor Sune Karlsson, a key contributor to research on Bayesian Variable Selection.

Key features in Indicio

Webinar - Identifying leading indicators

In this recorded webinar, we will explore the advantages and disadvantages of various methodologies for identifying leading indicators. We'll cover approaches ranging from visual plotting and correlation analysis to advanced techniques for variable selection.

How it works

Frequently asked questions

What is variable selection, and why does it matter for forecasting accuracy?

Variable selection is the process of choosing which variables (features) your model should actually use. Things like price, promotions, weather, holidays, macro indicators, or custom business signals. Instead of feeding the model every possible variable, we keep the signals that add predictive value and drop those that add noise.

How does your feature determine which variables and transformations to include or exclude?

Our feature offers several strategies to choose variables and transformations. It can use search algorithms (backward, forward, stepwise) to test many variable combinations, Lasso to shrink small coefficients to zero, and Bayesian methods that keep variables with high posterior inclusion probability.

Can I combine automatic selection with my own expert-picked variables?

Yes, you can override the variable selection results if you need to have specific variables in your forecasting models.

How do you handle multicollinearity and redundant predictors?

Multicollinearity mainly affects classical statistical models, while Lasso and Bayesian approaches already penalize it. For classical models, you can drop variables flagged in multicollinearity warnings or let variable selection remove them using a model that is sensitive to multicollinearity.

Does variable selection work for both univariate and multivariate time series?

In Indicio, variable selection is applied only to multivariate models. Univariate models can only include other variables through exogenous modeling, which needs forecasts and would introduce look-ahead bias during evaluation since actual values are used for the exogenous variables.

What methods do you use (e.g., regularization, feature importance, SHAP) to rank variable relevance?

Indicio offers several methods for ranking variables by relevance. It can either be done in the variable selection, where we use search algorithms (backward, forward, stepwise) that test variable combinations, Lasso to shrink small coefficients to zero, and Bayesian methods that keep variables with high posterior inclusion probability.

Ranking the variables' relevance can also be done in the last step in the forecasting process to translate complex forecast models into drivers and barriers using SHAP.


How does the feature prevent overfitting, especially with many candidate variables?

Indicio limits overfitting in several ways; train/validation splits and cross-validation, regularization (Lasso and Bayesian shrinkage), and automated variable selection that removes weak or redundant predictors.

Tip: comparing in-sample and out-of-sample results helps spot overfitting.

Can I see transparency/explainability on why a variable was selected or dropped?

Yes. You can inspect diagnostics like coefficients, and impact on accuracy. Together these show which variables were kept or dropped, how strongly they influence the model, and whether they help or hurt forecast performance.

How does variable selection impact training speed and inference latency at scale?

Variable selection adds some overhead, since it needs to test and compare different subsets of predictors. At scale, that cost is offset by smaller final models: fewer predictors speed up training of the chosen model and reduce inference latency in production.

What data prep is required (missing values, seasonality/holiday flags, categorical encodings) for best results?

Indicio automatically detects and treats missing values and seasonality. You can also flag and handle outliers and calendar effects such as holidays to further improve model performance.

Explore more features

Explore all features

Virtual demo

View our click-through demo

Experience the ease and accuracy of Indicio’s automated forecasting platform firsthand. Click to start a virtual demo today and discover how our cutting-edge tools can streamline your decision-making process.