Don’t trust just any statistics!
By Paul Schreyer (Paul.SCHREYER@oecd.org), Chief Statistician, Statistics and Data Directorate (OECD)
Originally published in German on Die Volkswirtschaft
The OECD is one of the world’s largest and most trusted sources of internationally comparable statistics and data for policy analysis – indeed, a significant share of visitors to the OECD website are seeking out data and statistics. They are also the backbone of OECD’s evidence-based analyses and policy recommendations.
Digitalisation has generated unprecedented volumes of data and with it an expectation of ever more granular and timely statistics, all accelerated by COVID-19 and rising geopolitical tensions. More timely statistics and data allow us to respond swiftly to new developments. More granular data means that we can be more targeted in our policy advice and better account for disparities across gender, regions, industries, firm size or demographics. But statistics and data must be trustworthy and of good quality. In short, they must be fit for purpose.
What is a quality statistic?
One might naturally think that a good quality statistic simply reflects as accurately as possible some aspect of the real world. But what is the value of an accurate statistic that no one can access, or trust, or understand? At the OECD, we view quality across seven dimensions.
Quality statistics must be relevant in that they serve the purposes of their users. They must be accurate, as timely as needed, accessible in a user-friendly format, easily to interpret and coherent, i.e., they can be meaningfully combined or compared across datasets. Importantly, quality statistics also feature good metadata, with clear information on sources and methods.
We don’t just hold ourselves to a high standard of quality, but we engage globally to promote good practices in statistics. For this, countries need well-functioning national statistical systems. The OECD’s reference here is the OECD Council Recommendation on Good Statistical Practice. While the recommendations apply principally to OECD Members, many non-members can and do adhere to them.
The changing nature of statistical production
With the digital transformation, the process of statistical production as well as available data and techniques have changed massively. In the past, National Statistics Offices (NSOs) relied almost exclusively on censuses, surveys, and registers to obtain information on individuals, households and businesses, and international organisations relied almost exclusively on official statistics provided by NSOs. In recent years, these sources have been supplemented and sometimes supplanted by a variety of new, more or less accessible, and rapidly evolving sources.
For instance, geospatial data are increasingly used for land cover and agricultural statistics. Statistics become particularly rich when geospatial data is combined with traditional survey or census information, as is the case in OECD work to measure population exposure to pollution, which combines data on fine particles per km2 with population density. These findings confirm that despite reductions in air pollution, populations (even those in OECD countries), are still exposed to harmful particulate matter.
Despite improvements in reducing air pollution, populations in most OECD countries remain chronically exposed to harmful levels for PM2.5 pollution
Note: Mean population exposure to fine particulate matter is the concentration level, expressed in micrograms per cubic meter (µg/m3), to which a typical resident is exposed throughout a year. Source: OECD Environment Statistics (database) (2020), OECD calculations using IHME GBD 2019 concentration estimates (forthcoming). Subnational boundaries include data from FAO GAUL (2015).
Necessity is the mother of invention
Another example of new data sources comes from the measurement of prices. At the onset of the pandemic, it was no longer possible to send people into physical stores to survey the prices of the various products making up the inflation basket. Many NSOs had already been experimenting with ‘scanner data’ (i.e. data collected by retailers at the point of sale for administrative purposes) for some time, and were suddenly compelled to expand this survey practice.
Scanner data offers increased time and product coverage when compared with traditional survey-type methods of data collection. In addition, they provide information on transaction prices rather than advertised prices, meaning they account for all types of discounts. However, scanner data cover only part of the relevant inflation basket and they are not always “ready to use”, often requiring significant reshaping and manipulation to arrive at a usable dataset. Scanner data can also be costly to acquire. Price data can also be collected by scraping the sites of retailers and traders (i.e. web scraping), but the legal bases for web scraping are often unclear.
In general, a healthy dose of caution is required when using new sources for statistical production: while new ‘big’ data sets are often extensive, they are not necessarily representative; private sources may not be sustainable; or data ownership may be unclear. NSOs have to find the right balance between innovation and quality assurance for trustworthy statistics.
In an effort to remain at the frontier of data innovation, important international initiatives have been launched to co-invest in the modernisation of statistical organisations. Indeed, we engage actively in the UN-led High-Level Group for the Modernisation of Official Statistics, with the joint mission to identify trends, threats, and opportunities in modernising statistical organisations. Collaborative efforts and co-investment are key here to effectively leverage NSOs strengths and expertise to advise discussions on the country’s or region’s data ecosystem.
From data producers to data stewards
The challenge for NSOs and the OECD alike is to reap the benefits of the data deluge while maintaining the level of quality that underpins trust in statistics. In this sense this new wealth of data is both an opportunity and a challenge. Where international organisations and national administrations previously acted as the principle (and often only) producer of statistics, we are now becoming stewards or gatekeepers at the centre of a diverse data ecosystem.
In this new data ecosystem, data are often fragmented and difficult to reconcile between sources. Yet, this is a necessary condition to tap into some of the richest data sources: for instance administrative records of various kinds. One important responsibility of a data steward is therefore to co-ordinate access to administrative data sources for statistical and research purposes, while fully respecting confidentiality standards. Such co-ordination through common classifications, inter-operable data systems and the right institutional framework also reduces the response burden on people and businesses from traditional surveys. New projects have been initiated to bring us closer to such a ‘Once Only’ principle of data collection, where citizens and businesses need only provide certain standardised information to administrators once.
New ‘big’ data brings both new opportunities and new challenges. More than ever before there is an enthusiasm and vibrancy in the statistical space to harness these opportunities. Everyone should be keen to learn, to explore and to answer new and longstanding questions using data – I certainly am! But not every piece of data is fit for use, and NSOs and international organisations have a key role to play when it comes to providing the trusted data and statistics that are so fundamental for evidence-based policy and, ultimately, for democracy.