Nowcasting labour productivity growth with machine learning and mixed-frequency data
By Yann-Yves Dorville (yann.dorville@oecd.org), Nhung Luu (nathalienhung.luu@oecd.org), Annabelle Mourougane (annabelle.mourougane@oecd.org) and Julia Schmidt, OECD Statistics and Data Directorate
An increasing need for timely productivity estimates
Productivity is a core determinant of long-term economic growth, living standards, and international competitiveness. Policymakers and analysts rely on productivity trends to guide decisions on growth, competitiveness, and structural reforms. However, and despite its importance, official productivity data typically come with a lag of one to two years. Such delays are especially problematic during periods of rapid change, as illustrated by the COVID-19 pandemic, when conventional metrics struggled to capture real-time developments.
Responding to this need, the OECD has developed an innovative nowcasting technique drawing on higher-frequency information to deliver timely estimates of labour productivity growth (Dorville et al., 2025). Our approach closely tracks actual productivity at the aggregate level (Figure 1). This is made possible by the simultaneous use of high-frequency data, a wealth of information in a panel setting, and a variety of nowcasting techniques, including machine learning models.
Making the most of available information
Three key approaches have been used to enhance the nowcasts: 1) leveraging a diverse set of models; 2) employing a global panel framework, and 3) integrating mixed-data sampling (MIDAS).
- Accounting for a wide range of potential data generating processes: a number of models have been tested, which include Dynamic Factor Models (DFM), penalized linear regressions (LASSO, Ridge, and Elastic Net), and tree-based techniques (Gradient Boosted Trees and Random Forests).
- Compensating poor data availability using a panel setting: the analysis covers 40 OECD and accession countries. They are pooled, given the limited series availability in some countries.
- Exploiting the most recent developments through high-frequency data: information embedded in monthly and quarterly indicators are incorporated into the model using the MIDAS framework. This allows us to capture recent developments more quickly, improving the timeliness of our estimates. Although MIDAS has been applied in other contexts, its combination with machine learning in a panel setting is still relatively rare.Information embedded in monthly and quarterly indicators are incorporated into the model using the MIDAS framework. This allows us to capture recent developments more quickly, improving the timeliness of our estimates. Although MIDAS has been applied in other contexts, its application in combination with machine learning in a panel setting is still relatively rare.
Harnessing better data
Model performance is measured with the Root Mean Squared Error (RMSE), which captures how far a model’s prediction is from the true values on average. From this exercise, three main findings emerge.
First, overall performance is good across OECD and accession countries. When evaluated against a simple autoregressive benchmark, the models achieve an average one-year-ahead prediction error below 10% for labour productivity growth over the period 2015–22.
Second, machine learning models – particularly Gradient Boosted Trees – achieve the highest predictive accuracy. Gradient Boosted Trees were selected as the best model in 35 out of 40 cases. On average, performance gains relative to the AR(1) approach reach around 35%, and in some instances (e.g. Denmark, France, Romania, Slovenia) these gains exceed 60%. While Ridge performs best in a few cases (Canada, Ireland, Lithuania) and Dynamic Factor Models (DFM) lead in Spain, no model outperforms the benchmark in Colombia. However, it is important to note that these predictions are subject to significant uncertainty, especially in the case of Ireland and Colombia.
Third, the MIDAS framework leads to predictive gains. Although average improvements offered by flexible (unrestricted) MIDAS specifications may appear moderate at the aggregate level, around three-quarters of the sample still benefit from carefully estimated weighting schemes. In certain countries – such as Switzerland, Chile, Romania, and Mexico – employing these richer MIDAS variants yields large RMSE reductions relative to simple, uniform-weight approaches. Overall, selecting the best performing MIDAS specification leads to significant performance gains against the conventional approach consisting of aggregating variables to the lowest frequency with an unweighted average. An examination of feature importance further emphasises the crucial role of these variables, as they account for 50-70% of performance improvement in most countries (Figure 3).
In conclusion, our approach – which blends multiple models, machine learning and mixed-frequency techniques within a global panel framework – achieved robust nowcasting performances, with an average error below 10 percent across OECD countries from 2015 to 2022. While certain smaller and more volatile economies pose greater challenges, incorporating more country-specific features could improve accuracy. High-frequency data proved especially critical, underlining the advantage of the MIDAS framework. Finally, the transparent, annual-update pipeline can be easily adapted to other economic indicators, making it a versatile tool for real-time analysis.
References
- Dorville, Y., et al. (2025) “Towards more timely measures of labour productivity growth“, OECD Statistics Working Papers, No. 2025/01, OECD Publishing, Paris
