![](https://neoshare.net/wp-content/uploads/2021/02/1BSQxe2Z1gJOgrlKky0nZ1g-750x420.png)
The main idea is that the model is not trained on beforehand prepared data but in an environment that helps to understand what is right and what is wrong with the help of symbolic carrots and sticks
Good example of reinforcement learning is dogs training
By the way, do you remember this moment from big ben theory :)? Seems something similar to reinforcement learning
It is important to understand that you need an environment that can simulate the behavior of your system in reality, without the possibility of conducting experiments, it cannot be used.
You need to be able to run multiple experiments on the environment, it can be quite expensive
Generally there are many methods for solving the control problem: STRIPS, Decision trees, HTN, Utility systems, MCTS, etc. All of them are still in use today. When to use them, and when RL is worth a separate article.
To make a long story short
- RL should be used if the number of options is too large for algorithms that use even directed enumeration of options
- When there is not enough expert knowledge to develop rules.
In 1997, the DeepBlue defeated world champion Harry Kasparov. DeepBlue didn’t use ML, algorithms were enough. But there is still one game left, where people could win — GO game. This situation didn’t change till 2017 when AlphaGo was released. AlphaGO already used ML.
GO game can be a good example for the 2 rules above.
3. When a decision needs to be made in real time.
For example, autopilot — there is no time to go through the options, and the number of possible situations is too large to develop a system of rules for all possible cases.
One example of when this might be needed is creating an autopilot for an airplane that flies over the surface of the water. It uses energy very efficiently, but has to keep a certain distance from the water. Development of such an autopilot costs millions of dollars. With the help of reinforcement learning, it would be possible to do it cheaper using a flight emulator.
As a conclusion — just a robot, that is teaching how to walk.