
Most Common Deep Learning Questions

If you are a junior data scientist like me, probably you find deep learning to be an extremely vast topic. However, there are some key concepts that are very common in entry position interviews and that you should have very clear.
In this article I will give you a quick guide to some of the most important concepts in deep learning, you can use it as a check-up every time you need to remind something.
Machine learning deals with algorithms that can learn or improve from experience, and deep learning is a particular technique of machine learning involving the use of artificial neural networks (ANN). Deep learning is used to solve complex problems involving unstructured data that cannot be solved accurately with traditional machine learning algorithms. Very often they are perceptual problems like computer vision or speech recognition.
Artificial Neural Networks (ANN) are computing systems made of connected nodes allowing for the information to flow through the network (this resembles the neurons and synapses in the brain), in deep learning it is common to have many layers of such nodes. Nodes of one layer are connected only to neurons of the immediately preceding and immediately following layers. The external information arrives at the input layer and the final result comes out of the output layer. In the middle, we can have a number of hidden layers.
Connections between layers can form many patterns, thus, different layers architectures form different types of networks, the most popular ones are: perceptrons, FeedForward Neural Networks (ANN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long-Short Term Memory (LSTM), Auto-Encoders, Generative Adversarial Networks (GAN).
They are a type of neural network inspired in the visual cortex and used to work with images because they are able to recognize visual objects by giving different weights or importances to structures in the picture like edges, lines, curves, vertexes, circles, etc. They have a hierarchical architecture made of a funnel and a final fully connected layer.
They are a type of neural network with the same structure as a directed graph, which means all vertices are connected together and edges have a direction from one to another. They are used in temporal problems because they can store in memory how a series of inputs (like for example data from 5 days ago in a daily time series) affects an output. That makes them suitable to “remember patterns” and therefore to be used in time series, speech recognition, or segmentation problems.
The activation function defines what a node will output depending on its input. Neurons in the brain have to decide whether to fire or not, similarly activation functions in neural networks decide if the weighted inputs arriving at the node are helping to improve the performance of the model (reducing error). The most common types of activation functions are the Sigmoid Function, Hyperbolic Tangent Function (Tanh), Softmax Function, Softsign Function, Rectified Linear Unit (ReLU), and Exponential Linear Units (ELUs).
They are very similar, the only difference lies in that the sigmoid is symmetric around the origin. The range of values, in this case, is from -1 to 1. Thus the inputs to the next layers will not always be of the same sign.
The ReLU function (Rectified Linear Unit) is a non-linear activation function and has become the most popular in deep learning that has gained popularity in the deep learning domain because it doesn’t activate all the neurons at the same time. Therefore the neurons will only be deactivated if the output of the linear transformation is less than 0.
It’s a function that computes the weighted sum of the inputs to a neuron from the outputs of its predecessor neurons and their connections. Subsequently, we can add a bias to the final result.
They are constant values that we need to set before we run our algorithm, therefore we need to fine-tune them in order to achieve the most accurate results. Some of the most popular hyperparameters in deep learning are the number of hidden layers, learning rate, number of epochs, and batch size. Sometimes some parameters depend on others (like the size of some layers vs the number of layers). Grid-searching is the technique of autoscanning a list of possible values in order to find the best values for our model.
The learning rate controls the sample size in each step/iteration while moving in the direction of the gradient descent to find the minimum of a loss function. The batch size sets the number of samples that will be used during the training of the algorithm. It can go from 1 (we take the whole train set as the only sample) to as much as the whole length of the train set (we divide the train set and every observation is a sample). When the batch size is more than one we call the learning algorithm mini-batch gradient descent. The number of epochs is the number of complete passes through the training dataset. The epochs number can go from one to infinite, we should increase it as much as our desired computing time allows us.
Gradient descent is a machine learning algorithm that minimizes the loss function. It moves on iterations in the negative direction of the gradient (the direction in which error decreases). In machine learning, we use gradient descent to update the parameters of our model. The loss function or cost function measuring the error or cost for the outputs of our algorithms (compared to supervised real values) given some values of the independent variables used to train the algorithm.
These questions covered the most fundamental topics, of course, there is much more to know about deep learning but you can start with this as a good beginner mindset.
In future articles, we will dive deeper into this fascinating world, hope that you find it interesting and useful.