Activation function is the main computation core behind the artificial intelligence mostly for the Neural Network, and today will try to overview some of them by giving a short introduction and a clear example of the usual use case.
Binary step function
The Binary step function or the “Heaviside step function”, Is a function represent a signal that switches on a specific value or after a specific time a threshold. The binary step function is used mostly with one single perceptron neural network and used to separate linearly between two classes. But there’s a little caveat behind the uses of binary step function on NN, based on calculus the gradient descent of the step function is always 0 which represent no rating change for updating weights.
In the next we could find a “python” implementation of the Binary step function.
Linear Activation Function
Linear activation function it takes an input from the range [-inf, +inf] and produce a range [-inf, +inf], is little better than the binary step function where it stacked between {0, 1}. instead it can share the same issue like all the Linear function, where the derivative is always a constant make a backward propagation useless in term of updating weights. one more problem withe the linear activation it makes stacking layer in NN without no effect and the last layer as the first layer still had a linear activation.
Here is a snippet of code for linear implementation and the plot
Sigmoid activation function
Coming to our first non-linear activation function and one of the most common used one for several cause, The same shape with the Heaviside step function but its smoothness can prevent the jumps in the output which bouncing between the 0 and 1. It makes sigmoid function the best fit for the classification helps with a clear prediction.
Tanh activation function
The hyperbolic tangent is a trigonometric function like the sigmoid funcion share with it all the advantages of elementary changes of gradient descent. But the Tanh activation function have its secret weapon against the strongest negative value because of the zero centrist shape.
Rectified Linear Unit (ReLU)
The rectified linear unit or RelU for shot is an activation function used for converging the Neural Network very quickly than the Sigmoid or Tanh. Despite of it looks like a linear function but it’s tricked for the negative range which gives the function a derivative. But the RelU function may become dying on the zero or negative value.
Softmax activation function
The softmax activation function is the best fit for the output layer for the ability of classifying the One-Vs-All between the output classes in a multiple classification.