Predicting air quality

Predicting air quality with road traffic and weather data using machine learning

The above map displays hourly observed and predicted nitrogen dioxide (NO2) concentrations at an air quality sensor (black circle). The predictions are indirectly derived from the observations of a road traffic sensor (red circle) and a weather station (blue diamond).

Numerous air quality sensors continuously monitor atmospheric component, such as nitrogen oxides (NOx) and particulate matter (PM). These components are harmful to our health, and adversely affect the climate and environment. It is required by European law that air quality is measured, reported, and provided to the public. Luchtmeetnet.nl is the Dutch initiative providing this data. The sensors are scattered throughout the Netherlands as seen in the map above. Hourly measurements provide detailed information about the current and historic concentration of air pollutants.

Now an interesting problem arises where a lot of empty space devoid of air quality sensors cover the Netherlands. Would it not be interesting to reconstruct this data using external factors? Let us start with reconstructing (i.e. predicting) data at an existing air quality station. The road traffic data and weather data are preprocessed and fed into a machine learning model as training features (i.e. independent variables) where the response label (i.e. dependant variable) is the air quality. The left graph below show hourly observed and predicted NO2 concentrations, and the middle and right graphs display feature data components, such as road traffic speed and wind direction. The predictions are made possible by nonlinearly fitting historical data using a random forest regressor. You can drag the graphs to before 5/06. Here both the training features and response labels are used for training the random forest regressor. The data after 5/06 is where the trained machine learning model does the predictions using only the temporally corresponding road traffic and weather data.

Response labels

Training / testing features

Training / testing features