Geographical Revenue Analysis with GeoPandas and SQL: Analysing Hotel ADR
GeoPandas is a Python library designed for working with geospatial data. It has many uses when analysing data across different countries.
Specifically, companies that have international customers or operate internationally may be interested in metrics relevant to the regions in which they operate.
Suppose a hotel chain in Portugal wishes to answer the following question:
“Of all the customers that book with our hotel, which countries show among the highest average daily rate for these customers?”
To answer this question, let us see how GeoPandas can be used to visualise countries with the highest average daily rates.
The dataset used for this analysis is the hotel booking dataset from ScienceDirect by Antonio, Almeida and Nunes (2019), which is licensed under the Creative Commons Attribution 4.0 International license.
Firstly, let us take steps to manipulate our data so as to work efficiently with GeoPandas.
The hotel booking dataset that we are given concerns a hotel chain in Portugal that contains booking data for customers across different countries, with an ADR value for each customer and whether or not they cancelled their booking (1 = cancelled, 0 = not cancelled).
We wish to accomplish the following:
- Append the latitude and longitude of the capital city in each country to the country listed for each customer in the dataset (we will look only at the European market in this regard)
- Obtain the average ADR value by country
- Create a visualisation of average ADR by country in geopandas
Firstly, let us import the relevant libraries and then filter the dataset to only include entries where the IsCanceled variable is 0, i.e. only entries where the customer did not cancel their hotel booking.
import contextily as ctx
import geopandas as gpd