Analysing hotel customer groups with TensorFlow Probability

Source: Photo by PhotoMIX-Company from Pixabay

Customer segmentation is a key consideration for any business.

While it may be tempting to look at sales data in isolation, doing so can overlook the fact that different customer segments have different spending patterns, and this means that sales data can vary widely across different groups.

In this regard, using a standard linear regression to quantify the impact of different features on sales can be misleading, as different groups can exist within each feature that impact sales in a different way. Therefore, a mechanism is needed to model structured linear relationships in an appropriate way.

TensorFlow Probability will be…

Are interpretable machine learning models always practical?

Source: Photo by geralt from Pixabay

There is increasing emphasis on interpretable machine learning in the world of data.

Models have been growing ever more complex with the use of neural networks becoming more mainstream, along with the sheer size of data being analysed today.

In many cases, such complex models may not be fit for human interpretation in their own right. Therefore, there has been a push to make the model interpretable, whereby both the results and the process in achieving those results are understood by humans.

Data Itself Is Not Always Interpretable

In my view, a shortcoming of interpretable machine learning is that it assumes to a degree that the…

How has consumer interest for Dell XPS laptops fared throughout COVID-19?

Source: Photo by Clker-Free-Vector-Images / 29580 images from Pixabay

Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice. The findings and interpretations in this article are solely those of the author and are not endorsed by or affiliated with any third-party mentioned in this article. This article is not intended to promote any particular company or product.


In February of 2020, Forrester predicted that as a result of COVID-19, demand for hardware including computer and communications equipment will be quite weak due to…

How to generate random numbers using rejection sampling

Source: Photo by geralt from Pixabay

When generating random numbers from a particular distribution, this process can be automated to a large extent.

For instance, if one wants to generate 100 random numbers that belong to a normal distribution in R, it is as simple as executing:


However, how does this process actually work “under the hood”? How can an algorithm know whether a random number belongs to a particular distribution or not?

The answer is through rejection sampling.

Rejection sampling is a means of generating random numbers that belong to a particular distribution.

How Rejection Sampling Works

A Cartesian graph consists of x and y-axes across a defined…

Hands-on Tutorials

Using PyMC3 to model posterior distributions

Source: Photo by Alexas_Fotos from Pixabay

The primary purpose of Bayesian analysis is to model data given uncertainty.

Since one cannot access all the data about a population to determine its precise distribution, assumptions regarding the same are often made.

For instance, I might make an assumption regarding the mean height of a population in a particular country. This is a prior distribution, or a distribution that is founded on prior beliefs before looking at data that could prove or disprove that belief.

Upon analysing a new set of data (a likelihood function), prior beliefs and the likelihood function can then be combined to form the…

Examples using the Normal and Cauchy Distributions

Source: Photo by QuinceCreative from Pixabay

Rejection sampling is a means of generating random numbers that belong to a particular distribution.

For instance, let’s say that one wishes to generate 1,000 random numbers that follow a normal distribution. If one wishes to do this in Python using numpy, it is quite a simple execution:


However, how exactly does this process work? Upon generating random numbers in Python, how can an algorithm know whether a random number belongs to a particular distribution or not? This is where rejection sampling comes in.

Rejection Sampling

There is a reason I provided an image of a darts board at the beginning…

Using poweRlaw to test the power law hypothesis

Source: RStudio

A power law distribution (such as a Pareto distribution) describes the 80/20 rule that governs many phenomena around us.

For instance:

  • 80% of a company’s sales often comes from 20% of their customers
  • 80% of a computer’s storage space is often taken up by 20% of the files
  • 80% of the wealth in a country is owned by 20% of the people

These are just a few examples. While many believe that most datasets tend to follow a normal distribution — power law distributions tend to be a lot more common than we realise. …

We live in a world of uncertainty and imperfect information

Source: Mines (Ubuntu 18.04 LTS)

I often like to play chess and minesweeper in my spare time (yes, don’t laugh).

Of these two games, I have always found minesweeper more difficult to understand, and the rules of play have always seemed very opaque.

However, the latter game is much more resembling of how situations often unfold in the real world. Here is why that is relevant to data science.

Perfect vs. Imperfect Information

Compare that to chess, where in spite of one’s playing ability — all players have perfect information at all times.

One can always see every piece on the board, and neither opponent possesses any informational advantage…

Using SQL to manipulate time series data

Source: Photo by Tumisu from Pixabay

Tools such as Python or R are most often used to conduct deep time series analysis.

However, knowledge of how to work with time series data using SQL is essential, particularly when working with very large datasets or data that is constantly being updated.

Here are some useful commands that can be invoked in SQL to better work with time series data within the data table itself.


In this example, we are going to work with weather data collected across a range of different times and locations.

The data types in the table of the PostgreSQL database are as below:

Understanding the components of a time series

Source: Photo by geralt from Pixabay

For someone who originally comes from an economics background, it might seem quite strange that I would spend some time building models that can predict weather patterns.

I often questioned it myself — but there is a reason for it. Temperature patterns are one of the easiest time series to forecast.

Time Series Components

When a time series is decomposed — or broken into its individual elements — a series consists of the following components:

  • Trend: The general direction of the time series over a significant period of time
  • Seasonality: Patterns that frequently repeat themselves in a time series
  • Random: Random fluctuations in…

Michael Grogan

Data Science Consultant with expertise in economics and time series analysis |

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store