

When (not) to use neural networks
If you have read the last five articles, you should have got quite a good idea what neural networks are, what its parts are, how to train them and how to find the best model. What I haven’t talked about yet is when you actually want to go through all this hassle and construct such a model.
Other than media often suggests, neural networks are not a kind of a magic potion which can be added to anything in order to make it clever. Instead, neural networks are nothing more than a tool, and just like a hammer won’t help you to saw through a piece of wood, one has to first consider if neural networks are indeed the right tool to deal with a problem at hand. I want to outline here a few conditions and properties of the problems which can be solved with the help of neural networks.
In order to get an understanding for what kinds of problems we can use neural networks, we have to remember what it is in principle: Suppose we have to data vectors, x and y. They do not have to be of the same dimension (aka length). The neural network aims to express the mapping between them, such that m(x)=y. An example for this would be a x, a set of photos, and y, a set of labels, like “cat” or “dog”. A trained network could then label new photos correctly. Labeling photos is a typical example for an application of neural networks. They can efficiently deal with the high dimensionality of the picture data and have a great success of finding the right labels.
But there is a caveat. Imagine we train a neural network with photos of cats and dogs (almost something like the “Hello World” example for neural networks). What will happen if a photo of a boat is given to it? Well, it will tell you if the boat looks more like a cat or a dog — this and nothing more. Your neural network has no concept of a boat if it was not there during the training. Even if you have the category boat implemented, you still need to train it with pictures of boats. It cannot adopt to new points in the target space added during evaluation time — at this point, the network is static. Therefore we have
Condition 1: Neural networks perform a mapping between two finite sets. It cannot adapt during evaluation time, so all elements in the target space have to be present during training.
This is also related to second condition. Imagine now we knew before that there would be also pictures of boats, and in our training set of 1 million dog and 1 million cat pictures, there would be also 10 pictures of boats. Clearly, this will lead to unsatisfying results. The same would be the case if a significant subset of the pictures would be labelled wrongly — this can disturb the training process massively. Therefore we have
Condition 2: An extensive set of training data with high-quality is essential for a successful solution of the problem.
In practice, big tech companies collect massive amounts of data for a reason, and people hardly label things wrongly, as they have a genuine interest that their friend is mentioned on an Instagram story with a group selfie or that their pseudo-arty pictures are found under the right hashtag. You might notice that group selfies are not necessarily high-quality data — in the end, these are not biometric pictures like the ones used for passports. Often datasets of poor quality can be refined in order to be used for machine learning. Facebook does not learn your face by throwing a few dozen photos of you into a neural network, as they all have different brightness, angles and sizes. Instead, features are extracted and learned, which contain meta-information about your face (like distance between the eyes relative to their size, etc.). There are plenty of features which can be extracted, which enable in the end to identify every individual.
But imagine now humans would only consist out of two eyes at a certain distance, and every individual would have a unique eye-distance to eye-size ratio (maybe do not image this example in too much deail if you are scared easily upon imagining disattached eyes). Imagine also all eyes being similar, e.g. have the same colour, so that no other feature exists. Would Facebook then still use machine learning to connect photos of people with their account?
Probably not. You might have noticed it, but if not: this problem is linear. The latent space is one-dimensional, and every individual can be identified by binning the value of the feature. Neural networks could for sure solve this problem, but at an unnecessarily high cost. There are many examples for problems which can be theoretically solved with machine learning, but for which much better, conventional algorithms exist. In the end, the precision of neural networks is hard to control and training is very expensive. Another example for this is interpolation of one-dimensional data, which can be done much more efficiently using interpolation with polynomials instead of neural networks. The traditional approach can also deal with much less training data. Therefore we have:
Condition 3: The problem has to be either of such high dimension that conventional algorithms are not preferable, or no alternative computionally cheaper algorithm does exist.
This gives also rise to another problem with neural networks — they do not give an inherent error estimate, and often it is not clear how confident one can be about the quality of the result. Many conventional algorithms have a way of predicting the error based on the input data and chosen parameters of the algorithm. This is not the case for neural networks. Although one can estimate the quality of the model based on the validation set, there is no way to give error bounds on arbitrary data fed to it during production. This should be taken into account when deciding for neural networks.
Condition 4: A definite error estimate is not necessary, or the validation set is very big and at least as diverse as the data during production.
But often it is not just the error estimate which is of interest, but also the way the model is constructed. In many cases, the parameters with which two sets of data are mapped on each other carry information which are of interest. Imagine you want to extract from your face recognition model which relative values of features mark a female or a male face. Due to the great amount of parameters and their non-trivial interpretation in a neural network, this would be a virtually impossible task.
Condition 5: One is not interested in extracting any additional information from the model itself, e.g. a blackbox model is sufficient for the project.
One last condition should be taken into account which is often overlooked. Artificial intelligence is not true intelligence. Let’s get back to the cat and dog example, and add the category “muffin”. Most muffins don’t look like animals, and yet some very motivated baker could try to make the dessert look extra cute. Humans are very good at spotting the difference between a muffin that looks like a dog, and a dog that looks like a muffin. Neural networks not so much. When they will fail at such a problem and when not is hard to tell, as we do not know what it part of the image data it uses to identify a dog. Imagine it identifies a dog as an object with three black dogs at certain positions, then three chocolate chips could be enough to trick it. Humans use their incredible capabilities of interpretating all the background and foreground information in the picture as well as a lifetime of experiences aka data to know what is a dog and what a cake. Neural networks can’t do this.
arXiv:1801.09573.
Condition 6: The task does not involve problems which need semantic or contextual information, and can be reduced to simple data points.
These conditions serve only as an orientation, and if neural networks are a reasonable approach for a problem or not has to be decided for each individual project. Nevertheless, this article should have given you a hint when it is worth considering this approach.