Delhi’s air quality and number games

India needs to codify methodologies for processing air quality data to reduce confusion and build confidence in air pollution control measures.

Illustration by Avikal Somvanshi (Graphic: Open source)

Published on:

30 Mar 2021, 11:01 pm

After a decades-long fight, the persistently big air pollution numbers are starting to command proportionally big moolah in government budgets. On March 9, the Delhi government allocated an amount of Rs 9,394 crore in its 2021-22 budget to improve the city's infamous air.

Earlier this spring, the central government had earmarked Rs 2,217 crore for tackling air pollution in cities in the Union budget for 2021-22, which is actually the second installment of the Rs 4,400 crore-recommendation by the 15^th Finance Commission last year.

Welcome developments for sure, but do we know how to check if this money is really going to improve air quality?

We can measure improvement in air quality and see if it correlates with money spent. This is actually the plan of the Union government as it would ensure accountability among cities for the money being given to them.

But do we have a fool-proof method to do this measurement? Further, who can we trust to do this measurement given widespread penchant for creative accounting in this country?

This blog dwells on these two fundamental questions by critically examining the existing official methodology for measuring the improvement in air quality and disquiet around its scientific rigour and general trustworthiness. Delhi’s air is used as the primary filter to sieve through this number game.

Delhi’s trust deficit

There is always a heated disagreement over the rise and fall of air pollution levels in Delhi. Even though Delhi has seen the maximum expansion in the number of air quality monitoring stations in the recent years and has longer data series, the trends remain fuzzy.

While reports on worsening trends excite all, hint of improvement dampens the spirit. Reported trends on decline or bending of the pollution curve are put under the microscope to prove otherwise but reports on rise of pollution are accepted uncritically. In the meantime, the science of decoding trends remains shrouded in mist.

A significant sum of money has been spent in converting the city’s public transportation to CNG, building the metro railway, making cleaner fuel available to industries and public, stopping coal power generation, restricting trucks, phasing out old vehicles among others.

But the war on data and trend mystifies its contribution to the pollution concentration or to know the next level of cuts needed to meet the clean air standards to guide action.

In the meantime, gimmicks of smog towers detract from the task of curbing pollution, while contradictory expert speaks and number games further muddy the puddle.

Conflicting official claims

Even official reports on air quality trends are contradictory. Air quality monitoring stations of Central Pollution Control Board (CPCB) and Delhi Pollution Control Committee are the only regulatory source of data for official trend reporting.

CPCB publishes an annual status report under the National Air Quality Monitoring Programme (NAMP) that is the final word on air quality status and trend in the country to inform official initiatives.

Strangely enough, contradictory trends of rising and declining pollution have been reported within a very short time span. First, the NAMP report National Ambient Air Quality Status & Trends 2019 released in September 2020 proclaimed Delhi’s Particulate Matter 2.5 (PM_2.5) concentration is rising and 2019 was the worst year since 2015.

Yet, a few months later in February 2021, the Union Ministry of Environment, Forest and Climate Change (MoEFCC) submitted an affidavit to the Supreme Court (SC) of India claiming that PM_2.5levels in Delhi are steadily declining since 2016.

This affidavit claimed that 2019 was 19 per cent cleaner than 2016 and this improvement has climbed to 30 per cent in 2020. Thus, the MoEFCC claim sharply contradicted those by CPCB. Yet, both data is coming from the same official monitoring network.

Within a month, Delhi Government also seconded MoEFCC’s claim that Delhi’s air quality is improving consistently.

The Economic Survey of Delhi, 2020-21 stated that 2019 was the cleanest year since 2014. The survey didn’t include data for 2020, which of course was a special year due to the pandemic lockdowns and would have been cleaner than 2019.

There is no official explanation for these contradictory claims by MoEFCC and CPCB. While there is political interest in both claims, experts and the industrial–consultancy complex are also divided.

So, who or what is right about Delhi’s air?

This boils down to weak science and technical protocol for establishing air quality trends. The science has fallen through the cracks.

Sadly, despite massive expansion in air quality monitoring infrastructure, CPCB’s working is plagued by archaic protocols and weak science. Unless addressed immediately, how can cities report on air quality trends under the national clean air programme or justify their spending on air pollution control measures?

Official pollution calculation and data blues

The science, methodology and equipment for measuring concentration of a pollutant in the air has been established and codified for some time now.

There is a difference though. While protocol for using manual data to establish daily and annual averages have been adopted, similar protocol for using real-time data is still awaited.

Right now, it is used only for daily reporting of air quality index. So, what is at stake?

Currently, CPCB’s protocol on the use of manual data for estimating daily and annual pollution levels serves as the basis of the NAMP reports. Accordingly, the annual level is defined as “arithmetic mean of minimum 104 measurements in a year at a particular site taken twice a week 24-hourly at uniform intervals”.

The 104 days of monitoring in a year might seem too little but operating manual stations twice a week is a scientifically accepted sampling rate to determine annual level. In fact, the United States Environment Protection Agency (USEPA) requires sampling on every sixth day, roughly just once a week.

But the weekly mandate of the protocol is critical as it ensures all seasons are represented in equal measure even with limited monitoring. This doesn’t mean more data is not desirable but manual monitoring has limitations.

Often this benchmark for minimum data is not met. As much as 84 per cent of the 314 manual PM_2.5monitors in the country didn’t meet the minimum 104 days of monitoring required in 2019, as per CPCB’s own admission in the NAMP report.

It is also not known if the stations that meet the minimum 104 days requirement, get two readings every week of the year. The NAMP report doesn’t publish that information.

With regard to manual PM₁₀ monitoring, 57 per cent of 774 manual PM₁₀monitors didn’t complete minimum 104 days of monitoring.

On the other hand, automated continuous ambient air quality monitors or real-time monitors are generating data every day. But this is not used for tracking annual trend as there is no protocol for it.

There are big concerns around data gaps and missing data. While improving the quality of air quality monitoring, technical protocols are also needed to address the data gaps statistically for improved usability of data for regulatory reporting.

Let us look at some of the technical parameters that have bearing on the trend and concentration.

False averages

Technically, data from all the monitoring stations that fail to meet the minimum monitoring requirement can still be made useful by adopting a standard data substitution test or protocol. The United States and the European Union have these codified but none exist in India.

Given there is no data substitution tests or protocol to deal with significantly massive holes in monitored data, one ought to wonder what CPCB might be doing to make this incomplete data usable in its official-scientific reporting.

A quick look at the NAMP report annexure makes amply clear that it does absolutely nothing.

Each NAMP report carries annexure that summarises the data used from all monitored stations and it duly provides information of number of monitored days. This, however, is commendable from the perspective of transparency.

A matter of deep concern is that it reports arithmetic mean of incomplete data as “annual average”. For instance, the 2019 report ascribes 36 microgram per cubic metre (µg/m³) as the “annual average” of the monitoring station at Peenya Industrial Area in Bengaluru, which had just 13 monitored days.

There is no acceptable scientific way through which an annual average can be derived from just 13 days of data. Thus, most station-level “annual averages” are not really annual averages.

Less than minimum yet adequate

Let’s move to city averages. Major cities have more than one manual monitoring station. Therefore, CPCB computes city average as arithmetic mean of all the stations in the city.

This is something they themselves acknowledge is not a scientifically sound practice. Nevertheless, it is an easy and simple thing to do. A rationale never used by a scientist. Only good thing about this scientifically unsound approach is that it makes it easier to detect errors in the NAMP reports.

Technically, only stations that meet the minimum data requirement should be included in the computation of city averages. But since an overwhelming majority of stations don’t meet the minimum requirement, CPCB has put in place a different data threshold for computing city averages.

It is called adequate data requirement which the NAMP report defines as “cities where ≥50 days of monitoring was done in a year”. This is not to be found in the text of the national air quality standards.

It is noteworthy that the adequacy requirement is that of the “city” to have ≥50 days of monitoring and not that of the monitoring station. The number of monitored days at city level is a simple addition of number of monitored days at all of city’s stations.

This is the reason why city data table in the NAMP reports is punctuated with cities reporting more than 365 monitored days in a year. For example, in the latest NAMP report, number of monitored days for Ahmedabad stands at 783 days thanks to the existence of nine manual PM_2.5monitors in the city.

Interestingly, 88 days were the longest monitoring duration among Ahmedabad stations, which means none of the stations met the minimum data requirement as per the national standard.

This absurdity ensures that a city meets the adequacy requirement even when none of the city stations meet the minimum monitoring requirement like in Ahmedabad.

The NAMP report is flooded with cities that meet the data adequacy requirement despite none of their stations meeting the minimum data requirement.

The worst case is that of Guwahati that meets the data adequacy by adding up 24, 25 and 21 monitored days at its three stations at Bamunimaidam, Gopinath Nagar and Kamrup, respectively.

Leave aside minimum requirement of 104 days, these stations don’t even meet questionable adequacy requirement of 50 days. Nevertheless, under CPCB’s rules the city gets an official “annual average”.

Ideally, city’s monitored days should have been based on the arithmetic mean of monitored days at each city station. But we have already established that NAMP reports are far from ideal.

Murder of averaging

Given the significantly less-than-minimum monitored data availability at stations, any scientist would adopt a more sophisticated method than simple arithmetic mean to establish a city’s annual average.

This would ideally require creating a combined data set with individual monitored day data from each station in a way that ensures data is spread over the year, capturing all seasons and duplication of monitoring days among city stations is removed.

This combined data set, then, can be used with seasonal weightages and data substitution to compute the valid annual average. But this seems to be just too much work, so CPCB simply computes arithmetic mean of already dubious “annual average” of all city stations.

For example, 27µg/m³annual average for Guwahati city is a simple arithmetic mean of 27µg/m³, 30µg/m³ and 23µg/m³ “annual averages” of its stations at Bamunimaidam, Gopinath Nagar and Kamrup, respectively.

This simple and wrong approach of computing city averages from station-level data can be easily reproduced by using data found in the report’s annexure.

But there are a few exceptionsto this madness as well, where things get weirder. The most notable example is Delhi.

Accounted for but missing monitoring stations

Delhi city annual average in the latest NAMP report notes that 2019’s value is based on four stations which together have 329 monitored days and city average of 141µg/m³.

But the station level information reports existence of just three stations with combined 322 monitored days.

Arithmetic mean of these three stations works out to be 105µg/m³, a good 36µg/m³ lower than what is reported in the city table. Assuming there is consistence in CPCB’s methodology, the information available in the NAMP report can be used to reverse engineer the basic details of Delhi’s mysterious fourth station.

Reverse engineering suggests that this station must have worked for seven days with an “annual average” of about 250µg/m³. One ought to wonder why would this fourth station be included in city average computation but removed for station data annexure.

Similar mismatch between city table and station data annexure is noted in Bhopal, Madurai, Jammu and many other cities.

Then there are cities like Ujjain (with a total of seven monitored days between two stations) in the NAMP report that have their station-level annual average left blank in the annexure but the city annual average is provided in city table. Creative accounting 101?

Ambient air quality in cities with respect to PM_2.5during 2019

(Source: National Ambient Air Quality Status & Trends 2019, CPCB)

Location wise PM_2.5in 2019

(Source: National Ambient Air Quality Status & Trends 2019, CPCB)

Shifting sand

Deconstruction of NAMP report so far has made it clear that even if the data collected at monitoring stations might be of scientific quality following the codified methodology for doing so, its post-processing is anything but scientific.

There are bigger problems with the math for creating long-term trend as there is no written protocol for doing this even for manual stations.

For any trend analysis, it is important to have a standard as well as static set of variables and scale for their measurement. The interval in the time-series needs to be uniform and variable being tracked needs to be measured in same conditions, using exactly the same parameters just at different timestamps.

For instance, historic weather record of a place is based on a single or a fixed set of weather stations depending on the geographical spread being represented.

Location of these weather stations and measurement technique for each weather parameter is determined to ensure it serve as representative of a larger geography.

Stations at new locations might be added but they serve as supplement for informing micro-climate but these new locations never replace the historical stations in the city trend.

Separate trends, of course, can be created using the new stations and it might still be representative of the same geography but it is not comparable with the historical one without introducing a factor of equivalency.

Air quality monitoring stations are to be treated the same as weather stations while constructing long-term trends. But in the NAMP reports both the number and the location of monitoring stations in a city keep changing from year to year.

The changes are so drastic for a few cities that it would be unscientific to even compare their annual averages from different year publications. This asymmetry in data would be a major concern for anyone creating long-term air quality trends using data from these reports but not for CPCB.

These basic technicalities are missing in CPCB’s reported trends. The best illustration of this sloppiness in trend-making is Delhi’s PM_2.5trend in the latest NAMP report.

Blind men and an elephant called Delhi’s air quality

According to the NAMP report, PM_2.5levels in Delhi show a linear rise between 2015 and 2019. Delhi’s 2015 “annual average” of 95 µg/m³is based on six manual PM_2.5stations, none of which met the minimum requirement of 104 monitored days.

The “annual average” for 2016 was based on seven manual PM_2.5stations and those for 2017 and 2018 on six manual PM_2.5stations again, but not the same six.

The “annual average” of 2019 is 141 µg/m³and it is based on four monitoring stations of which only three are listed in the station level data in the annexure. (Table 1)

Of the listed three stations, only two — Janakpuri and ‘Pritampura’ — have been working since 2015 and are the only constant part of the mix of stations used to compute the city’s annual average each year.

The third station in 2019 “annual average” is at Naraina Industrial Area and has data from 2018 only. The fourth is a mystery station whose details we have earlier reverse engineered to seven monitored days with a ridiculous 250µg/m³“annual average”.

Stations that have been dropped from 2019 annual average are the ones that have consistently reported relatively lower pollution levels than stations at Janakpuri and ‘Pritampura’.

Scientific approach to creating a trend from this ever-changing mix of stations would have been to just use Janakpuri and ‘Pritampura’ stations. But CPCB just plotted “annual average” of each year without any control for data symmetry.

This would be unacceptable on its own even if we forget that the computation of annual averages at station and city levels falls short of the benchmark.

The magic of this sham of a methodology is that the three stations used in 2019’s annual average show a declining trend at their individual station level but Delhi, as a city, shows an increasing trend.

Delhi’s city- and station-level data from CPCB’s annual NAMP reports

(Note: green >= 104 monitored days; yellow = 50-103 monitored days; red < 50 monitored days.
Source: Compiled from National Ambient Air Quality Status & Trends reports of 2015, 2016, 2017, 2018 and 2019)

Aren’t there cross-checks in place?

PM_2.5is the most popular and the newest among the pollutants that the NAMP covers and reports on. The data completeness and changing stations problem unearthed with PM_2.5is present in analysis of NO₂, SO₂ and PM₁₀but it is not as glaring.

The number of stations for these old pollutants does not fluctuate much year-to-year. Therefore, the trend reported for NO₂, SO₂ and PM₁₀ is more reliable, though still not exactly scientific.

Delhi’s PM₁₀trend in the same report shows significant decline. Given the fact that PM_2.5is a subset of PM₁₀, showing literally opposite trend for same timeframe should have warranted a redo of all math.

And if these divergent trends are true, it should have sent alarm bells ringing across the scientific community as this is an indication that the chemical composition of Delhi’s air and particulate matter has dramatically changed for no apparent reason. But none of that happened.

Air Quality of Delhi

(Source: National Ambient Air Quality Status & Trends 2019, CPCB)

The report under its section Air Quality Index of Delhi said that “all Good / Satisfactory / Moderate days have increased significantly and Poor / Very Poor / Severe days in total have gone down as compared to 2018”.

In short, air pollution in Delhi declined between 2016 and 2019.

This AQI analysis is based on real-time monitoring data and not that of manual monitors, and is not just based on PM_2.5 but other pollutants as well. Also, this data is available for all 365 days of 2017, 2018 and 2019.

Days in various air quality index categories of Delhi (2016-2019)

(Source: National Ambient Air Quality Status & Trends 2019, CPCB)

Abandoning manual data but not the law

Both the Union government in its affidavit in the SC and Delhi government in its Economic Survey solely relied on real-time PM_2.5data while reporting trends in Delhi’s air quality.

And both showed a decline, unlike CPCB’s official trend report. Tragedy is that only the latter is a regulatory document and has legal ramifications.

Natural instinct would be to shift to real-time data, given the holes in the manual monitoring. But that is not the solution because problems are with processing of generated data and not the generation of data itself.

This is not to say there are no problems with generation of data from either of the technologies, but that is a matter of different analysis.

Real-time data right now is, in fact, more prone to creative accounting than manual data as no standards have been made or methodologies codified for processing this data.

Creating annual averages and trends at station or city level from this data is not even governed by any minimum data requirements.

New data, same errors

This is the reason why annual averages computed by the Union government and Delhi government using real-time data don’t match, even though both show declining trend.

Each government has chosen their own unique mix of stations while computing annual averages and has not ensured symmetry of data over years. The Economic Survey of Delhi footnotes that, in their trend, “city average is calculated from 2014-2017 for four stations and from 2018-20 for 24 stations”.

So the current problem with methodology used in NAMP reports would remain, even if the annual average and trend computation is shifted away from manual data.

People doing math for these reports can still readily produce garbage analysis and trends using real-time data.

Delhi’s annual averages computed by different agencies

	CPCB (NAMP report)	Delhi Govt (Economic Survey)	Union Govt (affidavit to SC)
2015	95	133
2016	118	137	135
2017	106	130	124
2018	121	128	114
2019	141	112	109
2020		101	95

(Source: National Ambient Air Quality Status & Trends 2019, CPCB, Economic Survey of Delhi, Delhi government, and MOEFCC’s affidavit in SC)

Need for codifying methodologies

Conventionally, there has been heavy reliance on manual monitoring for trend reporting. In the last couple of decades, the technology for real-time monitoring has become significantly robust and data generated by it is becoming more reliable and of scientific quality.

As a result, many countries have started framing standards and protocols for using real-time monitoring data for compliance and trend tracking.

USEPA has established detailed protocols to create equivalency between data generated by real-time monitors and manual monitors, ensuring both can be used interchangeably for regulatory compliance and trend making.

It has further reduced ambiguity in the methodologies for computing various averages and trends by codifying it and defining remedies for every known data problem.

Similar solutions are needed in India, especially if the funding for cities to combat air pollution is to be linked with quantification of air quality improvement.

NAMP is a mess, but where is the public scrutiny?

Knowing the reality of air pollution data, especially it’s processing in the country, it is important to be skeptical of any analysis being put out by any agency.

Calling out the glitches in monitoring and misinformation is critical to protect public interest and build public support for difficult structural changes that would be need to clean the air.

But the genuine scientific review of such blunders is increasingly absent from public discourse.

What is actual happening to Delhi air?

It is actually not a difficult question to answer, especially for Delhi, which has tremendous quantity of official data in public domain. One could just pick a scientifically astute methodology and do the math. Let me do this.

We only use the most granular data points available for any air quality analysis. In this case, it is the real-time data of 15-minute resolution available at CPCB’s online air quality data portal.

We do have reasonable confidence in CPCB’s automated real-time data portal as the data is directly fed into the system from the monitors without any known human intervention.

To process this granular raw data, we use USEPA’s official methodology, which guides the construction of daily, seasonal and annual averages from these 15-minute averages.

Missing data has been handled as per data substitution protocol defined by USEPA. For trend creation, instead of annual averagesm, we prefer using three-year averages as it flattens out the impact of yearly meteorological variation.

This is also based on USEPA’s approach to trend making. Details about this methodology can be found in the report Breathing Space by Centre for Science and Environment, a Delhi-based non-profit.

This methodology addresses all the issues that have been raised with CPCB’s methodology and the result shows significant decline in PM_2.5pollution but still alarmingly high level.

Nevertheless, we can independently verify both MoEFCC and Delhi government’s claims that air pollution in Delhi is declining over past few years. But the magnitude of change still varies based on stations used.

Delhi’s air quality trend based on three-year averages of real-time data

(Note: 5 stations = ITO, IHBAS, Mandir Marg, RK Puram, and Punjabi Bagh; 11 stations = AnandVihar, CRRI Mathura Road, IGI Airport, IHBAS, ITO, Mandir Marg, NSIT Dwarka, NorthCampus DU, Punjabi Bagh, RK Puram, and Shadipur
Source: Author’s own analysis using CPCB’s real-time data and USEPA’s methodology for constructing annual averages and trends.)

The trend created with the five oldest stations of the city shows 34 per cent decline between 2015-17 and 2018-20. Decline for same period gets reduced to 20 per cent if the trend is created with 11 stations.

This difference in magnitude, though not significant from scientific enquiry perspective, can become highly consequential for cities if their funding is linked to the quantum of air quality improvement. Even more so, if they are made to compete among themselves.

It means that, in addition to codifying the methodology for computing averages and trends, there is a need for identifying and locking certain monitoring stations in each city as trend stations.

Basically, it boils down to ensuring that we compare apples to apples and not to oranges or coconuts. And the need to do it is now, before we start signing those big cheques for expanding monitoring and cleaning of the air.

Air