Anscombe's quartet for PM2.5 | South London Scientific

When you see a PM2.5 concentration, does it mean what you think it does?

There’s an old statistical canard called Anscombe’s quartet. The statistician Francis Anscombe constructed four small datasets back in 1973 that share almost every descriptive statistic you might compute on them. The means line up. The variances line up. The correlations line up. The linear regressions line up. And yet the four datasets, when you actually plot them, look completely different: one is a clean line, one is a curve, one has an outlier dragging the regression off course, and one is essentially a vertical bar with a single rogue point. Same summary statistics. Wildly different underlying realities.

PM2.5 is quite similar.

When we talk about PM2.5 we talk about mass concentration: the weight of particles smaller than 2.5 micrometres in a given volume of air. A UK roadside monitor might report a mean annual concentration of 12 µg/m³, and that’s the number we use for regulation. It’s the number on the WHO guideline. It’s the number Defra reports against the Environment Act target. It’s the number a council officer will quote in a planning meeting.

What we usually don’t know is what those particles actually are.

A 12 µg/m³ reading could be the diesel exhaust of a busy A-road. It could be tyre wear and brake dust at a slow-moving junction. It could be wood smoke from neighbours’ stoves on a still winter evening. It could be secondary aerosol blown in from continental Europe, the chemistry-driven background that sits over most of southern England on a high-pressure day. It could be Saharan dust, which reaches the UK more often than people realise. It could be sea salt, marine in origin, particularly common on the west coast.

Each of these has the same mass per cubic metre. None of them is the same as any of the others.

That matters because particle sizes and shapes affect what particles can do once they reach your airway. Larger particles are mostly deposited in the upper airway, where the body has reasonable defences. Smaller particles bypass those defences and can reach the alveoli; the smallest can cross into the bloodstream through the lungs and propagate systemic effects to organs that aren’t traditionally thought of as targets of air pollution. Cardiovascular effects, dementia, kidney disease, and effects on the developing brain in children have all been linked to fine and ultrafine particles in ways that the mass-based regulation hasn’t traditionally captured.

If the source is diesel exhaust, the particles are tiny: count distributions peak in the tens of nanometres, with effectively no particles above a micron. If the source is Saharan dust, the distribution is dominated by coarse particles between one and two-and-a-half micrometres. The same 12 µg/m³ in the first case can contain almost a hundred times as many individual particles as the same 12 µg/m³ in the second.

Almost a hundred times. For the same regulated number.

[Figure to be added: top panel shows particle number size distributions for five PM2.5 sources (traffic, urban background, secondary aerosol, biomass burning, Saharan dust), all constrained to 12 µg/m³ mass; bottom panel shows the same five sources plotted by mass, surface area, and particle number, with the spread collapsing as you move from number to mass.]

What the figure shows is that PM2.5 mass is a compression of an underlying reality that has much more structure than the headline number suggests. If you slice the same data three different ways: by particle number, by surface area, and by mass: you get three different orderings of the same five sources, with the spread getting tighter as you compress more. By the time you get to mass, all five sources collapse to a single point. The information about what’s in the air is real in the underlying size distribution. It’s compressed away in the regulated mass concentration.

This isn’t an argument against PM2.5 regulation. Mass concentration is the right metric for a lot of public health purposes. The epidemiology underpinning the WHO guidelines is mass-based for good reasons. A consistent, comparable, internationally accepted number that drives policy at scale is more useful than a perfectly source-resolved measurement no one can act on.

But mass alone is the headline, not the story. To get the story you need to know what’s making up the mass. That’s the work of source apportionment: chemical analysis of filter samples, multi-pollutant inference from sensor networks, dispersion modelling tied to known emission inventories, sometimes all three combined. It’s harder than reading a number off a monitor, and the methods are still maturing for the kind of cheap, dense sensor networks now being deployed widely. But the question those methods answer is the operationally useful one. If the PM is coming from traffic, you can do something about traffic. If it’s coming from wood burning, traffic measures won’t help. If it’s secondary aerosol blown in from another country, no local intervention will move the needle.

The work of pulling that signal out of size-distribution data is one of the threads that runs through our work at South London Scientific. The same insight motivates both our indoor source apportionment research and our outdoor work with low-cost sensor networks. We’ll write more about both in the coming weeks.

For now: the next time someone tells you the air is at 12 µg/m³, the right next question isn’t “is that good or bad?” It’s “what’s it made of?”

Figure parameterised from published source profiles (Morawska et al. 2008; Seinfeld & Pandis 2016; others). Notebook and reproducible code at github.com/southlondonscientific/anscombe.