The Perceptron is one of the simplest ANN architectures, invented in 1957 by Frank Rosenblatt. It is based on a slightly different artificial neuron called a threshold logic unit (TLU), or sometimes a linear threshold unit (LTU).
A single TLU can be used for simple linear binary classification. It computes a linear combination of the inputs and if the result exceeds a threshold, it 👷♀️ outputs the positive class or else outputs the negative class.
For instance, you could use a single TLU to classify iris flowers based on the petal length and width. Training a TLU in this case means finding the right values for w0, w1, and w2.
A Perceptron is simply composed of a single layer of TLUs, with each TLU 🧑🍳 connected to all the inputs. When all the neurons in a layers are connected to every neuron in the previous layer, it is called a fully connected layer or a dense layer.
To represent the fact that each input is sent to every TLU, it is common to draw special passthrough neurons called input neurons — they just output whatever input hey are fed. All the input neurons form the input layer. 🧑🔧Moreover, an extra bias feature is generally added — it is typically represented using a special type of neuron called a bias neuron, which just outputs 1 all the time.
Thanks to the magic of linear algebra, it is possible to efficiently compute the outputs of a layer of artificial neurons for several instances at once,
Computing the outputs of a fully connected layer 🧑🔬
hW, b (X) = ϕ (XW + b)
- As always, X represents the matrix of input features. It has one row per instance, one column per feature.
- The weight matrix W contains all the connection weights except for the ones from the bias neuron. It has one row per input neuron and one column per artificial neuron in the layer.
- The bias vector b contains all the connection weights between the bias neuron and the artificial neurons. It has one bias term per artificial neuron.
- The function ϕ is called the activation function: when the artificial neurons are TLUs, it is a step function.
So how is a Perceptron trained? The Perceptron training algorithm proposed b Frank Rosenblatt was largely inspired by Hebb’s rule.
Perceptron learning rule🧑💻
Wi, j (next step) = Wi, j + η( yj − yhat j) xi
- Wi,j is the connection weight between the ith input neuron and the jth output neuron.
- Xi is the ith input value of the current training instance.
- yHATj is the output of the jth output neuron for the current training instance.
- yj is the target output of the jth output neuron for the current training instance.
- η is the learning rate.
The decision boundary of each output neuron is learn, so Perceptrons are incapable of learning complex patterns. However, if the training instances are linearly separable, Rosenblatt demonstrated that this algorithm would converge to a solution. This is called the Perceptron convergence theorem.
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptroniris = load_iris()
X = iris.data[:, (2, 3)] # petal length, petal width
y = (iris.target == 0).astype(np.int) # Iris Setosa?per_clf = Perceptron()
per_clf.fit(X, y)y_pred = per_clf.predict([[2, 0.5]])