How has consumer interest for Dell XPS laptops fared throughout COVID-19?

Source: Photo by Clker-Free-Vector-Images / 29580 images from Pixabay

Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice. The findings and interpretations in this article are solely those of the author and are not endorsed by or affiliated with any third-party mentioned in this article. This article is not intended to promote any particular company or product.

Background

In February of 2020, Forrester predicted that as a result of COVID-19, demand for hardware including computer and communications equipment will be quite weak due to…


How to generate random numbers using rejection sampling

Source: Photo by geralt from Pixabay

When generating random numbers from a particular distribution, this process can be automated to a large extent.

For instance, if one wants to generate 100 random numbers that belong to a normal distribution in R, it is as simple as executing:

rnorm(100)

However, how does this process actually work “under the hood”? How can an algorithm know whether a random number belongs to a particular distribution or not?

The answer is through rejection sampling.

Rejection sampling is a means of generating random numbers that belong to a particular distribution.

How Rejection Sampling Works

A Cartesian graph consists of x and y-axes across a defined…


Hands-on Tutorials

Using PyMC3 to model posterior distributions

Source: Photo by Alexas_Fotos from Pixabay

The primary purpose of Bayesian analysis is to model data given uncertainty.

Since one cannot access all the data about a population to determine its precise distribution, assumptions regarding the same are often made.

For instance, I might make an assumption regarding the mean height of a population in a particular country. This is a prior distribution, or a distribution that is founded on prior beliefs before looking at data that could prove or disprove that belief.

Upon analysing a new set of data (a likelihood function), prior beliefs and the likelihood function can then be combined to form the…


Examples using the Normal and Cauchy Distributions

Source: Photo by QuinceCreative from Pixabay

Rejection sampling is a means of generating random numbers that belong to a particular distribution.

For instance, let’s say that one wishes to generate 1,000 random numbers that follow a normal distribution. If one wishes to do this in Python using numpy, it is quite a simple execution:

np.random.randn(1000)

However, how exactly does this process work? Upon generating random numbers in Python, how can an algorithm know whether a random number belongs to a particular distribution or not? This is where rejection sampling comes in.

Rejection Sampling

There is a reason I provided an image of a darts board at the beginning…


Using poweRlaw to test the power law hypothesis

Source: RStudio

A power law distribution (such as a Pareto distribution) describes the 80/20 rule that governs many phenomena around us.

For instance:

  • 80% of a company’s sales often comes from 20% of their customers
  • 80% of a computer’s storage space is often taken up by 20% of the files
  • 80% of the wealth in a country is owned by 20% of the people

These are just a few examples. While many believe that most datasets tend to follow a normal distribution — power law distributions tend to be a lot more common than we realise. …


We live in a world of uncertainty and imperfect information

Source: Mines (Ubuntu 18.04 LTS)

I often like to play chess and minesweeper in my spare time (yes, don’t laugh).

Of these two games, I have always found minesweeper more difficult to understand, and the rules of play have always seemed very opaque.

However, the latter game is much more resembling of how situations often unfold in the real world. Here is why that is relevant to data science.

Perfect vs. Imperfect Information

Compare that to chess, where in spite of one’s playing ability — all players have perfect information at all times.

One can always see every piece on the board, and neither opponent possesses any informational advantage…


Using SQL to manipulate time series data

Source: Photo by Tumisu from Pixabay

Tools such as Python or R are most often used to conduct deep time series analysis.

However, knowledge of how to work with time series data using SQL is essential, particularly when working with very large datasets or data that is constantly being updated.

Here are some useful commands that can be invoked in SQL to better work with time series data within the data table itself.

Background

In this example, we are going to work with weather data collected across a range of different times and locations.

The data types in the table of the PostgreSQL database are as below:


Understanding the components of a time series

Source: Photo by geralt from Pixabay

For someone who originally comes from an economics background, it might seem quite strange that I would spend some time building models that can predict weather patterns.

I often questioned it myself — but there is a reason for it. Temperature patterns are one of the easiest time series to forecast.

Time Series Components

When a time series is decomposed — or broken into its individual elements — a series consists of the following components:

  • Trend: The general direction of the time series over a significant period of time
  • Seasonality: Patterns that frequently repeat themselves in a time series
  • Random: Random fluctuations in…


Using ARIMA to predict average daily rates

Source: Photo by nattanan23 from Pixabay

Average daily rates (henceforth referred to as ADR) represent the average rate per day paid by a staying customer at a hotel.

This is an important metric for a hotel, as it represents the overall profitability of each customer.

In this example, average daily rates for each customer are averaged over a weekly basis and then forecasted using an ARIMA model.

The below analysis is based on data from Antonio, Almeida and Nunes (2019): Hotel booking demand datasets.

Data Manipulation

In this particular dataset, the year and week number for each customer (along with each customer’s recorded ADR value) is provided separately.


Time series analysis of kilowatt consumption patterns

Source: Photo by 3938030 from Pixabay

In this example, XGBRegressor is used to predict kilowatt consumption patterns for the Dublin City Council Civic Offices, Ireland. The dataset in question is available from data.gov.ie.

What is XGBRegressor?

Have you used XGBoost (Extreme Gradient Boosting) for classification tasks before? If so, you will be familiar with the workings of this model.

Essentially, a gradient boosting model works by adding predictors to an ensemble in a sequential fashion, with the new predictor being fit to the residual errors made by the previous predictor. …

Michael Grogan

Data Science Consultant — Expertise in time series analysis, statistics, Bayesian modeling, and machine learning with TensorFlow | michael-grogan.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store