In this post, we’ll walk through a small neural network and see how it classifies data. We won’t spend much time on formulae. Instead, we’ll focus on visualizing the functions in the network’s nodes. At the end, you’ll have an intuitive sense of what happens at every step of a three-layered neural network.
To read this post, you should be familiar with neural networks and perceptrons. You should know about concepts like “sigmoid”, “activation function” and “bias node”. If you’re reading Programming Machine Learning, or another ML introduction, then you know everything you need already. Let’s jump in!
Things are always simpler when you ignore the details. Take neural networks: from a distance, they’re quite straightforward. A neural network is a machine that implements a function. For example, imagine a network that takes the cloud coverage and outputs the chances of rain. Once you’ve trained it on historical data, this neural network is a function that takes a number and returns another number. You could plot that function on a piece of paper, and the result would be as good as the neural network itself. Easy-peasy!
It’s when you look closer that things become confusing. There are so many parts in a neural network, and it’s hard to see how they come together to compose that final function. What kind of intermediate functions do the hidden nodes compute? What’s the role of the activation functions? Even jaded researchers have a hard time imagining what each neuron in a large network is doing.
Thankfully, smaller networks are easier to grok. If you walk through a three-layered neural network, you can understand all of it. That what we’re going to do here. We’ll step through a neural network, and see how its parts contribute to the end-to-end function.
Let’s start straight away, with a small data set:
This data set contains two classes: green triangles and blue squares. From now on, we’ll encode green triangles with the value 1 and blue squares with 0.
Let’s say that we have a classifier–for example, a neural network–trained on these data. We don’t care about the training part here. What we care about is this: after the training, the classifier implements a function. That function takes a point, and returns one of the two classes: