Probability Distributions: A Gentle Introduction

Let’s start by defining the two types of numerical variables: discrete and continuous.

Discrete variables — This represents the countable number of values. For example, we can count the number of boxes in a carton. A carton can contain 20 boxes and if you take one out, then it contains 19 boxes.
Continuous variables— This represents measurements. For example, the weight of the carton can assume possible weights of 40 grams, 40.01 grams or 40.00001 grams depending on how granular you want to get.

Let’s take a real-world example, the most common one — A fair coin:

Event — A possible outcome of doing something. In tossing a coin, we can either get a head or a tail
Experiment — An act that produces an event. Here it is the action of tossing the coin
Sample Space — It is the set of all possible outcomes of the experiment. Here there are 2 possible outcomes. Hence the sample space is 2
Probability — The chance of an event occurring. If we toss a coin, the chance of head occurring is 1/2 = 0.5 = 50%
Mutually Exclusive Event — 2 events are mutually exclusive if only one can take place at a time. In our example, only a head or a tail can occur at each experiment
Independent Events — The outcome of one event is not impacting the next event. If we toss the same coin 2 times, the fist outcome being a head or a tail does not affect the second outcome (If you draw cards, the second draw will have only 51 cards. Hence they are not Independent events)

A distribution is a formula that can calculate the probability of occurrence of a data point in a set of data points. Now, we can look at the types of distributions

There are hundreds of probability distributions however, an experienced statistician probably has probably only worked with about 12 of them. So for today, we will focus only on 3 of the most commonly used distributions.

Binomial Distribution
Normal Distribution
Poisson Distribution

Note: They may take any shape but the combined probability of all outcomes of a distribution is always 1.

While Normal Distribution deals with continuous values, the Binomial and Poisson Distribution deals with discrete values.

Quick Question: Is the outcome of tossing a coin continuous or discreet?

This is the simplest distribution that is used to describe discrete data. As the name suggests, there can be only 2 outcomes to an experiment in binomial distribution. From our earlier examples, a coin toss can result in either a head or a tail. Similarly pass or fail, yes or no outcomes follow this distribution.

Let’s start with a random variable X that equals to the outcome of the event getting all heads from flipping a fair coin 3 times. The number of heads flipped in the 3 trials can be as low as 0 or all 3 turns can out to be heads.

Looking at the likely outcomes (because it is easy to count): TTH, THH, HHT, HTT, THT, HTH, HHH, TTT. The sample space is 8

Now, what we want is the probability of flipping heads in 3 trials. You can see from the above list that out of the 8 possible outcomes, it occurs only once:

P(X=3) = 1/8

What are the other possible outcomes here? No heads, 1 head, 2 heads and all 3 heads.

P(X=0) = 1/8
P(X=1) = 3/8
P(X=2) = 3/8

If you try plotting it, you can get a graph like the below.

Now let’s see if we repeat this experiment 1,000 times if we can see the same distribution with the help of python

You can see that the distribution is preserved as the number of experiment increases. As the problem gets complex, you might want to use the formula:

Conditions for applying BD:

Each trial has only 2 discrete outcomes
The probability of outcome of each trial remains the same throughout the trial
The trials are independent events. That is, the outcome of one trial doesn’t affect the other.

Poisson distribution is based on the number of occurrences over a specified interval like time, distance etc. This kind of interval distribution is important in field operations and supply chain management— especially in queuing theory.

In an airport, there is an average of 10 flights landing in a 30 minute-period. For simplicity sake, let’s say the average stays the same throughout the day. What is the probability that exactly 8 flights land in the same 30-minute period? What is the probability that there are more than 10 flights landing? This forms a Poisson distribution as we are measuring probability of events within the same interval (30 minutes)

Here we have a different notation for mean:

λ= #Occurences / Specified interval

In our case, λ is 10 flight landings every 30 minute. The interval remains the same throughout the experiment.

For the first question, What is the probability that exactly 8 flights land in the same 30-minute period?

λ= 10, x =8

Now we can substitute the numbers in the formula:

P(X=8) = 0.09

We can calculate the probability for all other outcomes via Python and plot the distribution in the notebook:

Conditions for applying PD:

Each trial describes discrete events over an interval of time
The trials are independent events. That is, the outcome of one trial doesn’t affect the other.

Before we jump into normal distribution, let’s look at central limit theorem.

If your sample size is big enough, the estimate of the mean will converge towards the true mean
If the measurement is the measure of independent events, it will tend to be normally distributed

Hence, we can say mean is the measure of the central tendency and by knowing the mean and standard deviation, we can calculate the probability of a continuous variable that is normally distributed

Normal Distribution, also called the Gaussian Distribution is a probability distribution of continuous variable that is symmetric around the mean. It fits most naturally occurring Phenomena in the world, like the height of people (Note that most real-world distributions are not perfectly normal).

The distribution is based on the Central Limit Theorem and for a normal distribution, 68% of the data lies within +/- 1 standard deviation of the mean, 95% of the data lies with +/- 2 standard deviations of the mean, and 99.7% of data lies within +/- 3 standard deviations

A standard normal distribution has its mean at zero and SD as 1. It also has 0 skewness and a 3 kurtosis.

Using µ (mean) and σ (standard deviation), we can generate a normal distribution in python as follows:

Most practical problems are normally distributed. We will talk elaborately on this soon with an entire article on Stastical Experiments next week. In the meantime, here is a overview of normal distribution:

The beauty of any distribution is the sample size, you can see that as the sample size grows, the distribution moves towards normal distribution with the most likely outcomes converging towards the mean of the population. This is used in statistical experiments to interpret the outcomes of a population.

For example, let’s take height. How do we know the average height of the entire population in the world? We cannot but we can collect sample heights from enough people that we can form a normal distribution. This can answer most of the questions of heights of various people.

I encourage you to look at the code in each distribution and vary the parameters to see their effect on the distribution. I hope this article helps you in your data science journey.

Conditions for applying BD:

Conditions for applying PD:

Footer