Structural Time Series Modelling: Forecasting Air Passenger Numbers

Use of StructTS and dynamic linear modelling

Michael Grogan


Photo by John McArthur on Unsplash


The purpose of this article is to illustrate the use of a Structural Time Series model to forecast air passenger numbers using the Air Traffic Passenger Statistics dataset from DataSF Open Data, which is licensed under the Public Domain and Dedication License (PDDL).

Business Context

The airline industry is very dynamic. As we have seen both pre and post-COVID, passenger demand can change quite quickly.

At the start of the pandemic, passenger numbers collapsed due to travel restrictions. However, the speed at which passenger demand rebounded was equally as surprising.

Forecasting passenger numbers is an important task for an airline — as this will have implications for factors such as what type of aircraft should be deployed on a particular route, estimated fuel costs, expected revenue from a route, among others.

Therefore, an airline that is looking to forecast passenger numbers needs to use a time series model that can quickly react to unanticipated “shocks”. Traditional time series models such as ARIMA are not necessarily the best choice for this.

Data and Modelling

For this problem, it was chosen to analyse historical air passenger data for British Airways — specifically for enplaned passengers travelling internationally to Europe from July 2005 to December 2022.

The time series model will be built using data up until December 2021 — and then the forecasts for 2022 will be compared with that of the actual. Due to a collapse in passenger numbers during the initial stages of the pandemic, data for May 2020 was not included in the dataset. For the purposes of data analysis, it was decided to include a value of 106 passengers for May 2020 — which is the same as that recorded for April 2020.

For this analysis, a structural time series model was fitted to the data using the StructTS function as included in the base package stats in R. Specifically, the structural model for the time series is fitted by maximum likelihood — whereby the goal is to…