In the last post on Hidden Markov models (HMM), we never solved the problem of finding the most probable sequence of coins used. If you didn’t read the post on HMMs, I would highly encourage you to do so. For those of you who did not, I’ll outline the problem. Let’s say some guru came up to you and told you to pick a coin from a bag (there are only two coins in the bag) and flip the coin. You’ll either observe a head or a tail. You then put the coin back in the bag and perform the cycle 3 more times. You observe head, tail, tail, head. The guru says that he’ll give you a million dollars if you guess the sequence. The guru gives you an HMM. Finding the most probable sequence will yield the best chances of winning, but how can we find this? As you might have guessed from the title, we can use the Viterbi Algorithm.
The Viterbi Algorithm is a dynamic programming solution for finding the most probable hidden state sequence. If we have a set of states Q and a set of observations O, we are trying to find the state sequence that maximizes P(Q|O). By conditional probability, we can transform P(Q|O) to P(Q,O)/P(O), but there is no need in finding P(O) as P(O) does not pertain to changes in state sequences. We can just find the state sequence that maximizes P(Q,O). Let’s dive into the formula for P(Q, O).
Note: T is the number of observations in the sequence. If you need a refresher on the symbols used in the HMM model, I would recommend reviewing my post on HMMs.
So, if we wanted to find the state sequence to maximize P(Q, O), you could imagine this would be quite expensive as we would have to maximize P(Q, O) for all possible state sequences. This is where the Viterbi algorithm comes in. The Viterbi algorithm is an iterative approach to solving for the most likely sequence. Rather than finding the most probable hidden state sequence for all the observations, we just want to find the next most probable hidden state. We iteratively find the most probable states until we are done with the sequence of observations.
There are some symbols we should denote for the Viterbi algorithm before we dive deeper into the steps.
The Viterbi Algorithm is composed of three steps.
Step1: Initialization
We first create a start state q*. We then find the probabilities of the initial states and the observations given the initial states. In this case, P(qi|q*) is the the probability that the start state is qi.
Step 2: Induction
We perform the induction step when t is greater than or equal to 2 and less than or equal to T where T is the number of observations + 1 (the plus 1 comes from the added start state). T represents to total number of observations.
Step 3: Termination
Let’s figure out if we can win the prize. As mentioned, the guru gives us an HMM to give us a better probability of winning the money.
So, let’s go ahead and start finding the most probable hidden state sequence when we observed head, tail, tail, and head. I like to visualize the Viterbi algorithm using a trellis (the diagram found below).
From the definition of the algorithm, we need to first initialize before we perform the induction step.
Initialization Step
Here we are calculating the probability of observing head at coin 1 and head at coin 2 at time step 1. Instead of user P(qi | q*), we use the initial state distribution. Now that we have initialized the function at time step 1, we can begin the induction step.
Induction Steps at time t=2
Remember, in the induction step, we want to discover with the state at time t-1 maximizes the function. We do this by taking the state which maximizes the delta function. We do this with respect to all the states in our current time step. Here we see that the coin that maximizes the probability at time step 2 is coin 1 if we are in state coin 1 at time step 2, and coin 2 maximizes the probability if we are in state coin 2 at time step 2.
I don’t want the bog down the main part of the blog with the calculations so I’ll add them to the bottom of the blog if anyone wants to take a look. After we are done with the calculations, we find that the sequence with the maximum likelihood has a probability of around 1.2%. This leads us to conclude that with all this effort we have done, we more than likely will not win this game. As for the most likely sequence, Coin 1, Coin 1, Coin 1, Coin 1. Such an anticlimactic way to solve the problem.
Hopefully, this was a helpful introduction to the Viterbi Algorithm. I am thinking about changing the format of the blogs in the future to taking a data science approach to solving real-world problems like proving why you’ll never successfully beat the house in blackjack or crash to predicting bitcoin prices. If you would like to see more content about this, drop a comment below. As always, if you liked the post, make sure to smash the clap button, and if you like these styles of posts, make sure to follow me.