Who said so? The math says so
I’m going to explain here is how to answer questions about neurobiology with an 80% probability that you will give the same answer as a neurobiologist. So let’s go.
So here’s a neuron. It’s got a cell body. And there is a nucleus.
Neurons, like other cells, have a cell body (called the soma). The nucleus of the neuron is found in the soma. It includes many short, branching processes, known as dendrites, and a separate process that is typically longer than the dendrites, known as the axon.
Dendrites
The two neuronal functions, receiving and processing incoming information, generally take place in the dendrites and cell body. Incoming signals can be either excitatory – which means they tend to make the neuron fire (generate an electrical impulse) – or inhibitory – which means that they tend to keep the neuron from firing.
Most neurons receive many input signals throughout their dendritic trees. A single neuron may have more than one set of dendrites and may receive many thousands of input signals. Whether or not a neuron is excited into firing an impulse depends on the sum of all of the excitatory and inhibitory signals it receives. If the neuron does end up firing, the nerve impulse, or action potential, is conducted down the axon.
Axons differ from dendrites in several ways.
- The dendrites tend to taper and are often covered with little bumps called spines. In contrast, the axon tends to stay the same diameter for most of its length and doesn’t have spines.
- The axon arises from the cell body at a specialized area called the axon hillock.
- Finally, many axons are covered with a special insulating substance called myelin, which helps them convey the nerve impulse rapidly. Myelin is never found on dendrites.
Towards its end, the axon splits up into many branches and develops bulbous swellings known as axon terminals (or nerve terminals). These axon terminals make connections on target cells.
Neuron-to-neuron connections are made onto the dendrites and cell bodies of other neurons. These connections, known as synapses, are the sites at which information is carried from the first neuron, the presynaptic neuron, to the target neuron (the postsynaptic neuron).
At most synapses and junctions, information is transmitted in the form of chemical messengers called neurotransmitters. When an action potential travels down an axon and reaches the axon terminal, it triggers the release of neurotransmitter from the presynaptic cell. Neurotransmitter molecules cross the synapse and bind to membrane receptors on the postsynaptic cell, conveying an excitatory or inhibitory signal.
Thus, the basic neuronal function — communicating information to target cells — is carried out by the axon and the axon terminals. Just as a single neuron may receive inputs from many presynaptic neurons, it may also make synaptic connections on numerous postsynaptic neurons via different axon terminals.
So there it is. How can we model, that sort of thing? Well, here’s what’s usually done.
First of all, we’ve got some kind of binary input, because these things either fire or they don’t fire. So it’s an all-or-none kind of situation.
We have some kind of input value. We’ll call it x1. And is either a 0 or 1. And then it gets multiplied times some kind of weight. We’ll call it w1. So this part here is modelling this synaptic connection. It may be more or less strong. And if it’s more strong, the weight goes up. And if it’s less strong, the weight goes down. So that reflects the influence of the synapse on whether or not the whole axon decides it’s stimulated. Also, we got other inputs down here — xn, also 0 or 1. It’s also multiplied by a weight. We’ll call that wn. And now, we have to somehow represent the way in which these inputs are collected together — how they have collective force. we’ll run it through a summer like so.
But then we have to decide if the collective influence of all those inputs is sufficient to make the neuron fire. So we’re going to do that by running it through a threshold box like so.
Here is what the box looks like in terms of the relationship between input and output. And what you can see here is that nothing happens until the input exceeds some threshold t. If that happens, then the output z is a 1. Otherwise, it’s a 0. So binary, binary out — we model the synaptic weights by these multipliers. We model the cumulative effect of all that input to the neuron by a summer. We decide if it’s going to be an all-or-none 1 by running it through this threshold box and seeing if the sum of the products adds up to more than the threshold. If so, we get a 1.
So what in the end are we in fact modelling?
Well, with this model,
- All or none
- Cumulative influence
- Synaptic weight.
But that’s not all that there might be to model in a real neuron. We might want to deal with the
- Refractory period
- Axonal bifurcation.
- Time patterns.
So we’ve got this model of what a neuron does.
What about what does a collection of these neurons do? Well, we can think of your skull like a big box full of neurons. Maybe a better way to think of this is that your head is full of neurons. And they in turn are full of weights and thresholds like so. So into this box come a variety of inputs x1 through xm. And these find their way to the inside of this gaggle of neurons. And out here come a bunch of outputs z1 through zm. And there a whole bunch of these maybe like so. And there are a lot of inputs like so.
And somehow these inputs through the influence of the weights of the thresholds come out as a set of outputs.
So we can write that down a little fancier by just saying that z is a vector, which is a function of, certainly the input vector, but also the weight vector and the threshold vector. So that’s all a neural net is. And when we train a neural net, all we’re going to be able to do is adjust those weights and thresholds so that what we get out is what we want.
So a neural net is a function approximation (P). It’s good to think about that. It’s a function approximator. So maybe we’ve got some sample data that gives us an output vector that’s desired as another function of the input, forgetting about what the weights and the thresholds are. That’s what we want to get out. And so how well we’re doing can be figured out by comparing the desired value(d) with the actual value(z).
The question is what should that function be? How should we measure performance given that we have what we want out here and what we actually got out here?
Well, one simple thing to do is just to measure the magnitude of the difference. That makes sense. But of course, that would give us a performance function that is a function of the distance between those vectors would look like this.
But this turns out to be mathematically inconvenient in the end. So how do you think we’re going to turn it up a little bit? How about just we square it? And that way we’re going to go from this little sharp point down there to something that looks more like that.
So it’s best when the difference is 0, of course. And it gets worse as you move away from 0. But what we’re trying to do here is we’re trying to get to a minimum value. I like to think in terms of improvement as going uphill
instead of downhill. So I’m going to dress this up one more step — put a minus sign out there. And then our performance function looks like this. It’s always negative. And the best value it can possibly be is zero.