## Some Quick Terminology

- Agent: The algorithm that we are trying to create
- Environment: The environment is the setting that you are in. This includes rewards, actions, and states
- Actions: Actions are all the possible things that an Agent can do in an environment
- Reward: The reward is an indicator to tell the Agent if what it is doing is right or wrong
- Policy: The policy is how the agent determines which action it will take in a given state. Usually, the agent’s goal is to optimize this policy so that the agent will get the most rewards possible.

Before I get into the details, I would like it to be noted that I got a lot of my information from David Silver’s Lectures on Youtube. If you are really interested in getting into Reinforcement Learning, you can check those out here!

## What Even Is a Markov?

The Markov Processes are named after a Russian mathemetician, Andrey Markov. He was born in 1856 and is best known for his work in stochastic processes. His work was later named The Markov Processes and Markov Chains. A lot of Markov I know 🙂

There are four different types of Markov Decision Processes.

- The Markov Property
- The Markov Process
- The Markov Reward Process
- The Markov Decision Process

Each of these Markov Processes is a way to model a problem so it can be solved using Reinforcement Learning.

## Let Us Begin With The Markov Property

The Markov property states:

“The future is independent of the past given the present”

This means that the state St+1 only relies on the information from state S. Once the new state is known, all information from previous states can be thrown away.

## The Markov Process

A Markov Process is basically just multiple states that all contain the Markov properties linked together. There are also transition probabilities from each of these states going to one another.