# Modelling Bimodal Distributions with multimode in R

## Analysing distributions with more than one mode

We often think of most distributions as having one mode. This includes distributions such as the normal distribution, which is a standard reference in statistics.

# Window Functions in SQL: Aggregating Values

## Calculate running totals with window functions

When working with a table in SQL, it is often the case that one might wish to aggregate values, or calculate a running total among the values in the table.

In this article, we will investigate how this can be done using what is called a window function.

# Grouping Dates in Pandas and SQL

## How to group data using Python and SQL

When working with a dataset, it is often the case that the data is not in the necessary format to conduct the appropriate analysis.

For instance, what if we wish to conduct a time series forecast, but there exist many data points over the same time period?

In this example…

# Principal Component Analysis vs. ExtraTreesClassifier

## Is one feature selection method better than the other?

The purpose of Principal Component Analysis (PCA) is to identify the features that demonstrate the largest amount of variance in a training set.

This is used as a feature selection method to identify the most important attributes that influence the outcome variable — thus allowing for the discarding of variables…

# UNION in SQL: A Must-Know Clause

## Analysing results from multiple tables

Many of the common queries that we learn in SQL such as GROUP BY are typically used with analysing one table in isolation. It is also common to use a JOIN clause in joining two tables together and treating them as one.

However, there will often be instances where one…

# Huber and Ridge Regressions in Python: Dealing with Outliers

## How to handle outliers in a dataset

Traditional linear regression can prove to have some shortcomings when it comes to handling outliers in a set of data.

Specifically, if a data point lies very far away from other points in the set — this can significantly influence the least squares regression line, i.e. …

# Views in SQL: Underutilised, Yet Very Useful

## How views can aid analysis across SQL databases

A view is a virtual table in SQL that functions similarly to a standard table, but does not need to be physically stored, i.e. it is only stored in memory and does not take up actual storage space.

For instance, there will often be times when one would like to…

# Bayesian Linear Regression: Analysis of Car Sales with arm in R

## Using Bayesian Linear Regression to account for uncertainty

Linear regression is among the most frequently used — and most useful — modelling tool.

While no form of regression analysis can ever approximate reality, it can do quite a good job at both making predictions for the dependent variable and determining the extent to which each independent variable impacts…

# Analysis of Car Sales with ANOVA in R

## Determining sales differences across groups

The primary purpose of using an ANOVA (Analysis of Variance) model is to determine whether differences in means exist across groups.

While a t-test is capable of establishing if differences exist across two means — a more extensive test is necessary if several groups exist.

In this example, we will…

# XGBoost For Time Series Forecasting: Don’t Use It Blindly

## Forecasting techniques don’t work well with all time series 