What are [artificial] neural networks?
The status quo for explaining neural networks or artificial neural network (this is what they are formally referred to in the scientific literature) is to start by showing how they resemble the neurons in the biological brain. Since I like that explanation, and I believe it’s inspiring I will stick to it.
To begin with, the biological brain is made out of millions and trillions of neurons. Everything we do in our daily lives, the way we see, the way we interact with the world, the way we feel, the way we move, fundamentally comes down to the way neurons in the brain wire and fire. You can think of neurons as the simplest [biological] processing unit. It is made out of dendrites (receivers of information), nucleus (the processing unit), axon, and axon terminals (senders of information).
Neurons are connected with each other through synapses, which are junctions where information is shared (electric signals) among neurons. In other words, the stronger the connection of synapses, the stronger the signal transmitted.
In 1956, Frank Rosenblatt came up with a really clever way of loosely representing these concepts with simple mathematical formulas. The intuition behind it is that there are a bunch of neurons and they communicate with each other with weights and biases, trying to map input A →output B (wow, what just happened?). These two are the fundamental components of neural networks.
A neuron, as depicted above has some inputs [10, 7, 3]. These inputs are connected to the neuron with weights [0.5, 1.0, 0.1] that represent the strength of the inputs. The function of the neuron is to multiply inputs with corresponding weights to determine if the result is greater than 10 (but can be any number); if it is, the neuron outputs 1, otherwise 0. Additionally, a neuron can output more than two values (0|1). That is done using activation function, which are used as functions to project the output of the network to a certain range.
Weights tell the neuron to which inputs to respond more strongly. This is what gets updated during training — and fundamentally this is how networks learn.
Neurons are organized in layers, where they are connected with the neurons of the previous and next layer, but not within the that layer. The information in networks goes strictly forward, from the input (layer 0), to output (layer L, where L is the number of the last layer). The layers in-between input layer and output layer are called hidden layers (they’re called hidden because, compared from input and output, they’re not directly observable — we will talk more on visualizing neural networks on another article).
If you put enough neurons in a hidden layers, hidden layers in a neural network, and data in the neural network — the network starts learning better than any other algorithm.
If you like to dive more into details of neural networks and how they learn, I highly suggest you check this out— here’s a great explanation for neural networks.
Why is it getting all the recognition?
The main reason of all the recognition they are getting is that this method of learning generally works better than other classical machine learning algorithms. One distinguishing characteristic is that for the classical methods they perform relatively well with a decent amount of data, and after that point their performance plateaus. On the other hand, neural-network-based learning have continually demonstrated to continue to improve with more data and more data.
Side note: This is why big companies like Google, Facebook, Apple, Amazon, Netflix prefer to use algorithms like these — because they have ‘infinite’ amounts of data, and this is why their products continue to improve.
Down below, I have added a link where yo can tinker with neural network and all the above concepts explained.
Let’s try it to do a binary (two classes) classification with data of different difficulty to get an intuition how neural nets work and tinker with different parameters to see their performance!
Important note: Learning rate is the variable that controls how quickly the network learns.
Things to tinker with:
- Data (Left-hand)
- Features (The things that get inputted to the network)
- The minimum number of neurons and hidden layers for a Test loss sm0.05
- Learning rate
- Activation
Here’s a challenge for you: Can you get the Test Loss (presented just above the graph of data in the right) less than 0.05 — if you do, what parameters did you use?
For a more systematic exploration of the parameters check this out.
One thing that has helped me immensely in understanding the core concepts of neural networks and exploring their practicality is tinkering with them. Below you can find a list of the most useful links I have found (feel free to suggest more in the comment section down below):