Ground-Level NO₂ Predictions via CNN

Plain-English Summary

Wisconsin has only three NO₂ air quality monitors, all located in Milwaukee. For everyone else in the state — from Green Bay to rural farm communities — there is no direct measurement of this harmful pollutant. Satellites can see the whole state, but satellite readings don't directly translate to ground-level concentrations.

This project is building a convolutional neural network — the same type of AI used in image recognition — to read satellite imagery and predict what the ground-level NO₂ would be at any location in Wisconsin. The key novelty: training the model on NASA's new TEMPO satellite, which captures hourly NO₂ data at higher resolution than any prior satellite, alongside the established TROPOMI instrument. No published model has been trained on TEMPO data yet.

The Research Gap

Three problems this project addresses

Why this work is needed

1

No model has been trained on TEMPO yet. TEMPO launched in April 2023 and provides hourly NO₂ imagery over North America at higher resolution than any prior satellite. It represents an entirely new source of data — and no published model has used it.

2

Existing AI models fail at new locations. The best current approach predicts NO₂ accurately at sites it trained on, but accuracy drops sharply at sites it has never seen — the real-world test that matters. TEMPO's richer hourly data may help the model learn patterns that generalize more broadly.

3

Wisconsin has only three NO₂ monitors, all in Milwaukee. The rest of the state — farms, mid-size cities, industrial zones — has no direct air quality measurement for this pollutant. That leaves communities potentially exposed to elevated NO₂ without any data.

Background

Why this is hard to measure

NO₂ is a traffic and combustion pollutant linked to asthma, cardiovascular disease, and the formation of ozone and PM2.5. The EPA sets a legal limit of 53 ppb annual average.

Satellites can photograph NO₂ across the entire state at once — but they measure the total amount in the whole air column above, not just what's at the surface where people breathe. The relationship between the satellite reading and ground level varies by location, season, weather, and nearby land use. Translating satellite data into accurate surface readings requires machine learning. And because NO₂ can vary tenfold over just a few kilometers, the model needs to learn fine-scale spatial patterns — which is exactly what convolutional neural networks (CNNs) are designed to do.

The main satellite used in prior work is TROPOMI (TROPOspheric Monitoring Instrument), launched by the European Space Agency in 2017. It orbits the Earth once a day and images NO₂ across the globe at roughly 3.5 km resolution. This project also introduces TEMPO (Tropospheric Emissions: Monitoring of Pollution), NASA's geostationary satellite launched in April 2023. Unlike TROPOMI, TEMPO stays fixed above North America and captures imagery every hour throughout the day — up to 12 snapshots where TROPOMI gives one.

Prior Work

Two key studies that frame this project

Two prior studies define the starting point. Both use TROPOMI; neither uses TEMPO.

Aspect	Kim et al. 2024 (JGR Atmospheres)	Cao 2023 (Frontiers Env. Sci.)
Method	Multivariate linear regression (MLR)	Convolutional neural network (CNN)
Satellite input	TROPOMI (single value per site)	TROPOMI (2D pixel grids)
Best annual R²	0.78 (anscMLR)	0.952
Best daily R²	Annual only	0.892
Spatial CV R²	0.65–0.89 (regional)	0.593 (poor)
Key strength	Interpretable, minimal compute	Spatial pattern learning
Key limitation	Cannot capture spatial structure	Fails at unseen locations
Supervisor connection	Prof. Holloway's lab (UW-Madison)	Fairview High School student

Cao's CNN is more accurate but collapses at new locations. This project tests whether TEMPO's hourly imagery can close that gap.

Proposed Approach

Following Cao's CNN protocol, extended with TEMPO

The core methodology follows Cao (2023): a convolutional neural network that receives stacked 2D satellite imagery grids around each EPA monitoring site and predicts surface NO₂ concentration. The key extension is training on both TROPOMI and TEMPO inputs, plus adapting the pipeline specifically for Wisconsin's monitoring context.

Cao (2023) Baseline

SatelliteTROPOMI only

Sites500 CONUS monitors

Period2018–2022

Resolution3.5 km × 5.5 km

TemporalDaily + annual

Grid size4,000 m (5×6 pixels)

This Project (Extension)

SatelliteTROPOMI + TEMPO

SitesWI monitors + CONUS

Period2023–present (TEMPO era)

Resolution2.1 km × 4.4 km (TEMPO)

TemporalHourly (TEMPO) + daily

WI focusStatewide coverage map

TEMPO passes over a location up to 12 times per day; TROPOMI does so once. That richer daily signal may teach the model patterns that hold up at new locations — addressing the field's central limitation.

Research Plan

Five phases from data to Wisconsin maps

01

Problem Definition and Data Inventory

Define what the model predicts (ground-level NO₂ in ppb), identify all input data sources (satellite imagery, meteorology, land use, roads), and inventory Wisconsin's 3 EPA monitors alongside nationwide training sites.

02

Data Acquisition and Pipeline

Pull TROPOMI and TEMPO imagery from Google Earth Engine as pixel grids. Download weather data (temperature, wind, pressure). Collect land-use layers (roads, vegetation, population density). Align everything to EPA monitor locations in space and time.

03

Preprocessing and Feature Engineering

Fill satellite data gaps caused by cloud cover. Normalize inputs so the model trains cleanly. Build feature grids at multiple resolutions and add time-of-day and seasonal encodings.

04

Model Training and Evaluation

Train the CNN and compare it against simpler baseline models using the same inputs. The critical test: how well does it predict at locations it has never seen during training? That spatial generalization score is the key metric.

05

Wisconsin NO₂ Mapping

Apply the model across all of Wisconsin to generate a statewide NO₂ map at roughly 1 km resolution — identifying high-exposure areas that have no monitors today, and comparing lakeshore vs. inland and urban vs. rural patterns.

Research Questions

What this project is trying to answer

Does adding TEMPO's hourly data make the model work better at places it hasn't seen before — solving the field's core generalization problem?
Can the model produce reliable NO₂ estimates for Wisconsin locations that currently have no monitor at all?
Are there communities across Wisconsin with elevated NO₂ that nobody is currently measuring — and if so, where?