Understand Bayes Rule, Likelihood, Prior and Posterior

A case study based introduction to using Bayes rule and how it compares with a frequentist, pessimistic and optimistic approaches to drawing conclusions

This post will help you understand Bayesian inference at an intuitive level with the help of a simple case study. I hope that once you read this article, you will be very clear on how the well-known “Bayes theorem” is used, what do the terms in the theorem mean (prior, posterior, likelihood) and how this compares with other approaches to decision making (pessimist /optimist/frequentist). We will use a simple case study to help explain the concepts. For those who are interested, I have provided simulation results for the given case study and a link to R code for further exploration. Let’s start with the case study:

There is a very dangerous but rare disease called dangeritis with 0.1% prevalence (1 in 1000 people get it). One morning you wake with chest pain (one of the symptoms of dangeritis). With no history of heart disease, you take a test of dangeritis as a precautionary measure. You suspect that the pain you had is muscular but you take the test just to be sure. Unfortunately, the test turns out positive suggesting that you have dangeritis. You feel like doing one more test but the cost is prohibitively high and you are told that the test is 99% accurate. You get deeply concerned finding this and perhaps that means that there is a 99% chance (almost certain) that you have the disease. However, you only have one symptom of dangeritis and you do wonder if, just maybe, the test was wrong? Thoroughly confused, you decide to get opinions from four of your close friends. You consult them telling them your story and asking them for their input. Your friends are:

(i) Percy Pessimist

(ii) Opal Optimist

(iii) Fin Frequentist

(iv) Ben Bayesian

They all present their argument and their verdict. The overall verdict turns out to be a tie with Percy and Fin concluding that you have dangeritis while Opal and Ben concluding that you do not have the disease. Here are the arguments from each of them:

Percy Pessimist

Bad things happen in this world, and if you have the symptom associated with a disease, and the test (which is highly accurate) suggests that you have the disease, then this is a clear verdict. I am sorry but you have the disease.

Percy’s mind concludes the worst possible outcome if there is any chance of that happening. Percy, however, argues that the conclusion is not unreasonable because the test is highly accurate and you had at least one related symptom. Percy does not make detailed use of data and relies on intuitions mostly.

Opal Optimist

You only had one symptom, and no other symptoms. And there is a chance that the test could be wrong, even if it is 1% but it does happen! Taken together, I do not believe that you have the disease.

Like Percy, Opal does not make detailed use of data and relies on intuitions mostly. Unlike Percy though, Opal tends to infer the most favourable outcome if there is any chance of that happening.

Fin Frequentist

Look, I am a data person and I have made use of probability theory and worked out that you most likely have the disease. The chances are very high, 99%! I am sorry but that is life…

Fin is a data person and he makes use of data to draw conclusions. Fin concludes that you have the disease, unfortunately. He uses probabilistic argument to support his conclusion. He says:

Look, there are two possible situations. Either you have the disease,

probability of having the disease

or you do not have the disease:

probability of not having the disease

In addition, you can get a positive test result:

probability of test being positive

or a negative test result:

probability of test being negative

Fin argues that we have already seen the data (that your test returned a positive result suggesting that you have the disease). It is now a question of figuring what is the “likelihood” of observing this data given that you have the disease, and what is the “likelihood” of observing this data given that you do not have the disease (and then choosing the likelihood which is bigger for drawing conclusion). Fin computes the likelihood supposing both scenarios and then chooses the likelihood that is bigger of the two to draw his conclusion. As the earlier statement suggests, we call this term “likelihood” and the approach taken by Fin is that of maximising the likelihood (maximum likelihood). Fin computes the likelihood supposing both scenarios and then chooses the likelihood that is bigger of the two to draw his conclusion.

Likelihood refers to the probability of observing the data that has been observed assuming that the data came from a specific scenario.

(i): What is the likelihood of observing a positive test result given that you have the disease:

probability of test being positive given that you have the disease

(ii): What is the likelihood of observing a positive test result given that you do not have the disease:

probability of test being positive given that you do not have the disease

Fin computes the two likelihoods and then figures out which one has a bigger value to draw his conclusion. To work out the two likelihoods, Fin uses the test accuracy information provided. We have been told that the accuracy of the test is 99%. This means that the test make correct decisions 99% of the time. This information is sufficient to work out the likelihoods for both scenarios (that you have the disease, and that you do not have the disease). To express this in probability terms, let us think of four scenarios in two grids (this is called a confusion matrix, see my article on ROC for a more gentle introduction), and this is a straight forward case. The cases where no mistake is made has a probability of 99% (shaded in green in the figure below) and consequently, the cases where a mistake is made have a probability of 1% (shaded red in the figure below). We have observed positive case, and we just need to find out the likelihood of observing this in two steps: first working out the probability of observing this outcome assuming that you have the disease, and then working out the probability of observing this outcome assuming that you do not have the disease. When Fin works out the two likelihoods, he sees that there is only a 0.01 probability of you not having a disease and a 0.99 (99% chance!) probability that you have the disease. Fin consequently concludes that you, highly likely, have the disease.

Likelihoods of test results given disease status for all four situations for a test with 99% accuracy

Ben Bayesian

Look, I am a data person and there are two crucial parts of information that I will take account of. One is the accuracy of the test, and one is the disease prevalence. I made use of both pieces of information in a systematic way using Bayes rule and concluded that the chance of you having the disease is only 9% and the chance that you do not have the disease is 91%. I, therefore, believe that you do not have the disease!

Like Fin, Ben also uses the data and works out the likelihood. However, he does not stop at that. He also makes use of previous knowledge on the subject (or “prior” information) that the disease is very rare and combines that with the likelihood information to draw his conclusion. What is the appropriate way to combine these two pieces of information? It turns out that this is the most well-known rule in probability called the “Bayes Rule”. Effectively, Ben is not seeking to calculate the likelihood or the prior probability. Ben is focussed on calculating the posterior probability.

Ben argues that the question you are asking is not: what is the probability of observing the test result that you did given that you had the disease (likelihood). It is in fact: what is the probability of you having the disease given that we observed that the test is positive (called posterior in Bayesian language).

Bayes formula helps us calculate posterior probability using likelihood and prior information together. The formula in plain English is:

Bayes formula in our specific case study is:

Prior is the probability of the disease before having seen any test result (our prior understanding/beliefs modelled in a single probability value). Evidence is also called the marginal likelihood and it acts like a normalizing constant and is independent of disease status (the evidence is the same whether calculating posterior for having the disease or not having the disease given a test result). We have already explained the likelihood in detail above. Posterior is the probability that takes both prior knowledge we have about the disease, and new data (the test result) into account.

When Ben uses the information given, the posterior probability that you have have the disease given that the test is positive is only 9%. And the posterior probability that you do not have the disease given that the test is positive is 91%. Consequently, Ben concludes that you are highly unlikely to have the disease.

So there you have it. You got two people who made no use of data, you got another person who only made use of data at that point without making use of previous knowledge on the subject and then you got another person who made use of the data, and the prior to draw a conclusion.

How would the posterior probability change for different scenarios of test accuracy, and disease prevalence?

The figure below shows how the posterior probability of you having the disease given that you got a positive test result changes with disease prevalence (for a fixed test accuracy). Notice how the posterior probability is below 50% for a disease prevalence less than ~2% despite a very high test accuracy!

Posterior probability with disease prevalence for a fixed test accuracy (Image by author)

The figure below shows how the posterior probability of you having the disease given that you got a positive test result changes with test accuracy (for a fixed disease prevalence). Notice how you really need a test that is almost 100% accurate if you want to reduce the amount of errors that you would get in situations where the disease prevalence is low.

This should also be a lesson for you to know that next time anyone claims that a given test is 98% accurate does not mean that it is good enough and the whole context needs to be looked at before drawing any conclusions. There are going to be situations where, despite being 98% accurate, a given test would lead to too many errors!

Posterior probability with test accuracy for a fixed disease prevalence (Image by author)

You can visit my github page to get the R code where you can generate the above figures and play with the parameters. I have commented extensively but please do ask if you have any questions 🙂

In this article, I have used a simple case study to explain the Bayesian approach to calculating the probability of having a rare disease when a highly accurate test provides a positive result. The Bayesian framework offers a principled approach to making use of both the accuracy of test result and prior knowledge we have about the disease to draw conclusions. For cases where the prior information is uninformative, the Bayesian approach is as good as the Maximum likelihood (the frequentist) approach.

A case study based introduction to using Bayes rule and how it compares with a frequentist, pessimistic and optimistic approaches to drawing conclusions

Percy Pessimist

Opal Optimist

Fin Frequentist

Ben Bayesian

Footer