Dynamic article by

Marnix Hamelberg

Last edit:



As part of a research internship at Geodan

Supervised by Tom van Tilburg

Predicting air quality

Predicting air quality with road traffic and weather data using machine learning

The above map displays hourly observed and predicted nitrogen dioxide (NO2) concentrations at an air quality sensor (black circle). The predictions are indirectly derived from the observations of a road traffic sensor (red circle) and a weather station (blue diamond).

Numerous air quality sensors continuously monitor atmospheric components, such as nitrogen oxides (NOx) and particulate matter (PM). These components are harmful to our health, and adversely affect the climate and environment. It is required by European law that air quality is measured, reported, and provided to the public. Luchtmeetnet.nl is the Dutch initiative providing this data. The sensors are scattered throughout the Netherlands as seen in the map above. Hourly measurements provide detailed information about the current and historic concentration of air pollutants.

Now an interesting problem arises where a lot of empty space devoid of air quality sensors cover the Netherlands. Would it not be interesting to reconstruct this data using external factors? Let us start with reconstructing (i.e. predicting) data at an existing air quality station. The road traffic data and weather data are preprocessed and fed into a machine learning model as explanatory variables (i.e. features) where the response variable (i.e. label) is the air quality. The left graph below show hourly observed and predicted NO2 concentrations, and the middle and right graphs display features, such as road traffic speed and wind direction. The predictions are made possible by nonlinearly fitting historical data using a random forest regressor. You can drag the graphs to before 5/06. Here the features and label are used for training the random forest regressor. The data after 5/06 is where the trained machine learning model does the predictions using only the temporally corresponding testing features to get an estimate of the label.

Training / testing label

Training / testing features

Training / testing features

To be continued