Kruskal-Wallis and Power Analysis in R: Analysing Flight Delays

Comparing groups using the Kruskal-Wallis test

Michael Grogan


Source: Image by Author — implementation in R using RStudio

The Kruskal-Wallis test is a non-parametric test that tests the null hypothesis that k sampled number of groups possess the same distribution function.

The test is performed by means of a ranking mechanism, whereby the observations across the samples are ordered by size and their values replaced with a corresponding rank. Using the kruskal.test function in R, this is performed automatically.

For this example, let us see how the Kruskal-Wallis test can be used to determine differences in delays across flights.

Airline Delay Example

Let us consider the following scenario. We wish to analyse hypothetical flight data across three separate airlines to determine whether the delay in takeoff time differs across these airlines.

In this regard, consider that we have three flights across separate airlines that have a delay in takeoff time as measured in minutes for each flight instance (a value of 0 means the flight took off on time).

> df
flight1 flight2 flight3
1 0 41 28
2 12 31 16
3 5 30 242
4 7 35 63
5 8 3 0

When generating summary statistics, we can see that the means differ significantly across the three flights — while the medians for Flights 2 and 3 remain similar.

> summary(df)
flight1 flight2 flight3
Min. : 0.0 Min. : 3 Min. : 0.0
1st Qu.: 5.0 1st Qu.:30 1st Qu.: 16.0
Median : 7.0 Median :31 Median : 28.0
Mean : 6.4 Mean :28 Mean : 69.8
3rd Qu.: 8.0 3rd Qu.:35 3rd Qu.: 63.0
Max. :12.0 Max. :41 Max. :242.0

To prepare the data for analysis, let us stack the data under the one column:

> d1 <- data.frame(delay=unlist(df, use.names = FALSE))
> groups <- c("flight1", "flight1", "flight1", "flight1", "flight1", "flight2", "flight2", "flight2", "flight2", "flight2", "flight3", "flight3", "flight3", "flight3", "flight3")
> df2 <- data.frame(groups, d1)

> df2
groups delay
1 flight1 0
2 flight1 12
3 flight1 5…