## Reinforcement learning is types of neural network that overcome the problem that other learning methods can not solve with open environment. Let’s review about basic machine learning and learn how to apply reinforcement learning method through Flappy Bird and Mario games.

**Machine Learning** is a subset of **Artificial Intelligence**. It include *supervisor learning, unsupervised learning, reinforcement learning* and their combination. Since the ideas of artificial neural network, a subset of machine learning called **Deep Learning**, using neural network, was born.

Let’s talk about Deep Learning first, it use neural network to see how “important” the input effect desired output. A simple fully-connected neural network has 3 layers: input, output and hidden layer, all in numberic form. Input provides neural network the “vision”, the important features that effect the output. The output layer can be one value, or multidimension vector. Each node has its weight and bias.

Let’s consider a problem: Get the picture of and predict the type of animal in that picture.

We need to find the input picture is a dog or a cat, so the output should be in 2 dimensions, one for dog and one for cat. We will label [1 0] for dog and [0 1] for cat.

The network will “learn” through back propagation and chain rules. To “learn” how “important” the input effect desired output, we need to update weights and bias, by error. An error is the difference between output and desired value. Feed forward is a model for prediction, but neural network will “learn” backward.

The neural network will adjust the weights and bias of each node, to self-learn the effect of input to the output based on our labelled data. Our aims is to minimize the error.

Return to machine learning, each type of learning is useful in particular problem. Machine learning use math algorithms to predict, classify, learning to minimize the error, etc.

Supervised learning use labelled data or data with similar things to classify or predict the trend. In the other side, unsupervised learning isn’t provided anything, it will learn from given data. In fact, unsupervised learning is a tool to help supervised learning “understand” more about data.

However, AI must solve more complex problem now. For example, Flappy Bird is a game that we can’t know exactly what is our desired output at a time. In addition, the game pipes is random, so we can’t fixed the value of pipe, bird,… every game. In this problem, we must use Reinforcement Learning.

Fortunately, Reinforcement Learning is relevant to our life, so it is easy to understand. Reinforcement Learning is learning from mistake, without data provided. The idea of reinforcement learning is letting the bot explore the environment. We will give it some reward to tell that the bot is doing good, and also a punishment when it fail. The bot will find the ways to maximize the reward.

Now we will know more about some algorithms and neural network by working with the code.

Thanks to Flappy Bird Clone Source, we can easily have the data of game as input.

## Requirements of core game:

- python
- numpy, pandas
- pygame

Although we know that we can’t use only neural network to learn how to play Flappy Bird, at least we try once to see the behavior of the bird when we only use fully-connected neural network.

First, build the network by ourselves to help us easier to understand.

**Activation functions:**

def sigmoid(x):

return 1/(1+np.exp(-x))def relu(x):

return np.maximum(0, x)def sigmoidDer(x):

return sigmoid(x)*(1. - sigmoid(x))def reluDer(self, x):

return 1 * (x > 0)

Sigmoid and ReLu is our activation function. As we see in the network, sometimes not every node is activated. In addition, we should input the value from 0 to 1 to the node.

As mentioned, the back propagation use chain rule, so we need the derivative of the activation function.

**Net Class:**

def __init__(self, w1=None, w2=None):

self.inputNode = 3

self.outputNode = 1

self.hiddenLayerNode = 5

self.w1 = w1

self.w2 = w2

self.hOut = None

self.fOut = Nonedef forward(self, X):

# inputs to hidden layer

hOut = sigmoid(np.dot(self.w1, X))

self.hOut = hOut

# final output

fOut = sigmoid(np.dot(self.w2, hOut))

self.fOut = fOut

return fOutdef backward(self, X, error):

deltaError = error*sigmoidDer(self.fOut)

hiddenError = np.dot(deltaError, np.transpose(self.w2))

deltaHidden = hiddenError.dot(self.w2.T)

self.w1 += np.dot(np.transpose(X), deltaHidden)

self.w2 += np.dot(np.transpose(self.hOut), deltaError)

The neural network runs from input to output, with one hidden layer. The output of a layer is very simple thanks to neural network, the output is just dot product of input and weights, and activate by Sigmoid function.

The number of input node is 3, so let’s stop a while to see our problem with Flappy Bird.

The input is the vertical position of next pipe, the vertical position of the following pipe, the height of the bird, and the horizontal distance from the bird to the next pipe. We have 3 input, 1 output (jump or not).

Through observation, we see that the bird movement is up and down, so the horizontal position is constant. There are always **2 column of pipes** on the screen and it will move backward the screen. After the bird score one point, we imediately update the position of the following pipe. So we only need 3 input now:

- The vertical position of next pipe
- The distance
- The height of the bird

We will normalize it as input to the neural network.

To help the neural network update value, we use derivative of sigmoid to help the network “learn” with chain rule. We don’t have the error so let’s set the error to the middle vertical position of the “hole”.

def __init__(self):

self.bestSOFar = 0

self.score = 0

self.gen = 0

self.position = None

self.brain = net()

self.distance = None

self.expectedPosition = None

self.nextHole = None

self.weight = np.random.rand(20,1)

self.gen = 0def initialize(self, position, distance, nextHole, expectedPosition):

self.position = position

self.distance = distance

self.nextHole = nextHole

self.expectedPosition = expectedPositiondef move(self):

self.score += 1/30

jump = self.think()

# print(jump)

#print(self.score)

if jump >= 0.705:

return True

else:

return Falsedef think(self):

self.brain.decode(self.weight)

inputLayer = [self.position, self.distance, self.nextHole]

maxValue = abs(max(max(inputLayer),min(inputLayer), key=abs))

inputLayer = np.divide(inputLayer,maxValue)

jump = self.brain.forward(inputLayer)

self.learn()

return jumpdef learn(self):

self.brain.backward([self.position, self.distance, self.nextHole], self.position - self.expectedPosition)

self.weight = self.brain.encode()

# print(self.weight)def printBotStat(self):

print('stat:n{}n{}n{}'.format(self.distance, self.speed, self.upperPipe))def dump(self):

fil = open('weight.json', 'w')

json.dump(self.weight, fil)

fil.close()

print('saved')def increaseGen(self):

self.gen += 1

**Result of neural network only**