**A case study based introduction to using Bayes rule and how it compares with a frequentist, pessimistic and optimistic approaches to drawing conclusions**

This post will help you understand Bayesian inference at an intuitive level with the help of a simple case study. I hope that once you read this article, you will be very clear on how the well-known “Bayes theorem” is used, what do the terms in the theorem mean (prior, posterior, likelihood) and how this compares with other approaches to decision making (pessimist /optimist/frequentist). We will use a simple case study to help explain the concepts. For those who are interested, I have provided simulation results for the given case study and a link to R code for further exploration. Let’s start with the case study:

There is a very dangerous but rare disease called dangeritis with 0.1% prevalence (1 in 1000 people get it). One morning you wake with chest pain (one of the symptoms of dangeritis). With no history of heart disease, you take a test of dangeritis as a precautionary measure. You suspect that the pain you had is muscular but you take the test just to be sure. Unfortunately, the test turns out positive suggesting that you have dangeritis. You feel like doing one more test but the cost is prohibitively high and you are told that the test is 99% accurate. You get deeply concerned finding this and perhaps that means that there is a 99% chance (almost certain) that you have the disease. However, you only have one symptom of dangeritis and you do wonder if, just maybe, the test was wrong? Thoroughly confused, you decide to get opinions from four of your close friends. You consult them telling them your story and asking them for their input. Your friends are:

(i) Percy Pessimist

(ii) Opal Optimist

(iii) Fin Frequentist

(iv) Ben Bayesian

They all present their argument and their verdict. The overall verdict turns out to be a tie with Percy and Fin concluding that you have dangeritis while Opal and Ben concluding that you do not have the disease. Here are the arguments from each of them:

**Percy Pessimist**

Bad things happen in this world, and if you have the symptom associated with a disease, and the test (which is highly accurate) suggests that you have the disease, then this is a clear verdict. I am sorry but you have the disease.

Percy’s mind concludes the worst possible outcome if there is any chance of that happening. Percy, however, argues that the conclusion is not unreasonable because the test is highly accurate and you had at least one related symptom. Percy does not make detailed use of data and relies on intuitions mostly.

**Opal Optimist**

You only had one symptom, and no other symptoms. And there is a chance that the test could be wrong, even if it is 1% but it does happen! Taken together, I do not believe that you have the disease.

Like Percy, Opal does not make detailed use of data and relies on intuitions mostly. Unlike Percy though, Opal tends to infer the most favourable outcome if there is any chance of that happening.

**Fin Frequentist**

Look, I am a data person and I have made use of probability theory and worked out that you most likely have the disease. The chances are very high, 99%! I am sorry but that is life…

Fin is a data person and he makes use of data to draw conclusions. Fin concludes that you have the disease, unfortunately. He uses probabilistic argument to support his conclusion. He says:

Look, there are two possible situations. Either you have the disease,

or you do not have the disease:

In addition, you can get a positive test result:

or a negative test result:

Fin argues that we have already seen the data (that your test returned a positive result suggesting that you have the disease). It is now a question of figuring what is the **“likelihood”** of observing this data given that you have the disease, and what is the “likelihood” of observing this data given that you do not have the disease (and then choosing the likelihood which is bigger for drawing conclusion). Fin computes the likelihood supposing both scenarios and then chooses the likelihood that is bigger of the two to draw his conclusion. As the earlier statement suggests, we call this term “likelihood” and the approach taken by Fin is that of maximising the likelihood (**maximum likelihood**). Fin computes the likelihood supposing both scenarios and then chooses the likelihood that is bigger of the two to draw his conclusion.

Likelihood refers to the probability of observing the data that has been observed assuming that the data came from a specific scenario.

(i): What is the likelihood of observing a positive test result given that you have the disease:

(ii): What is the likelihood of observing a positive test result given that you do not have the disease:

Fin computes the two likelihoods and then figures out which one has a bigger value to draw his conclusion. To work out the two likelihoods, Fin uses the test accuracy information provided. We have been told that the accuracy of the test is 99%. This means that the test make correct decisions 99% of the time. This information is sufficient to work out the likelihoods for both scenarios (that you have the disease, and that you do not have the disease). To express this in probability terms, let us think of four scenarios in two grids (this is called a confusion matrix, see my article on **ROC** for a more gentle introduction), and this is a straight forward case. The cases where no mistake is made has a probability of 99% (shaded in green in the figure below) and consequently, the cases where a mistake is made have a probability of 1% (shaded red in the figure below). We have observed positive case, and we just need to find out the likelihood of observing this in two steps: first working out the probability of observing this outcome assuming that you have the disease, and then working out the probability of observing this outcome assuming that you do not have the disease. When Fin works out the two likelihoods, he sees that there is only a 0.01 probability of you not having a disease and a 0.99 (99% chance!) probability that you have the disease. Fin consequently concludes that you, highly likely, have the disease.

**Ben Bayesian**

Look, I am a data person and there are two crucial parts of information that I will take account of. One is the accuracy of the test, and one is the disease prevalence. I made use of both pieces of information in a systematic way using Bayes rule and concluded that the chance of you having the disease is only 9% and the chance that you do not have the disease is 91%. I, therefore, believe that you do not have the disease!

Like Fin, Ben also uses the data and works out the likelihood. However, he does not stop at that. He also makes use of previous knowledge on the subject (or **“prior”** information) that the disease is very rare and combines that with the **likelihood** information to draw his conclusion. What is the appropriate way to combine these two pieces of information? It turns out that this is the most well-known rule in probability called the “Bayes Rule”. Effectively, Ben is not seeking to calculate the **likelihood **or the **prior **probability. Ben is focussed on calculating the **posterior** probability.

Ben argues that the question you are asking is not: what is the probability of observing the test result that you did given that you had the disease (likelihood). It is in fact: what is the probability of you having the disease given that we observed that the test is positive (called

posteriorin Bayesian language).

Bayes formula helps us calculate **posterior** probability using **likelihood** and **prior** information together. The formula in plain English is: