A Critical Appraisal of Deep Learning

The field has seen some strong progress, but the more we know, the more we realize we know nothing.

Everyone and their grandparents are talking about it: Artificial Intelligence, Deep Learning (DL), Machine Learning, Robotics, etc… Sometimes all those terms at once within the same sentence, sometimes as synonyms. If there’s one sure thing, these subjects have gained in popularity, to the point the general public has been having growing expectations. Major progress in facial recognition, image classifiers, in AI-vs.-human gaming (GO, DOTA), or self-driving cars, have created understandable hype. The hype itself was also driven by the over-zealous promises from CEOs and leading researchers.

But hype only survives when concrete applications come into place, and this is still lagging. As time goes by, a slow disenchantment is starting to kick-in in some areas. Chatbots have not lived up to their hype. Self-driving cars and Neural Network as a whole could be next. In this article, I will review Gary Marcus’ critical appraisal of Deep Learning¹, and complement it with personal commentary and some resources to go further.

In his paper (2018), Marcus discusses deep learning open challenges.

Deep learning is data-hungry

As an example, Humans need a few trials or data examples to transform those into actionable.

Marcus takes the example of a made-up word, a ‘schmister’. In our example, it’s defined as ‘having a sister over the age of 10 but below the age of 21 years’. With one example, you can easily tell if you have a schmister, or if friends do. And more, by deduction, you can deduce your parents probably don’t have any.

Marcus here underlines deep learning’s current lack of a mechanism for learning abstractions through explicit, verbal definition. While instead, it works best when there are thousands, millions or even billions of training examples, as in DeepMind’s performance on games.

In short, in problems where data are scarce, deep learning is not yet an ideal solution. Yet, the advent of small-data in deep learning might bring change.

Deep learning is shallow and has a limited capacity for transfer

Let’s return to a game example, and use Atari’s Breakout. In case you aren’t familiar, the idea is to use the pad to direct the ball and break bricks. A perfect technique would be to build a tunnel so the ball gets through the wall, then let the ball bounce back and forth and destroy bricks, while you go enjoy a coffee.

Atari Breakout — Source: Wikipedia

Marcus explains that “a model might be able to beat master that game, the same way you just did, by building a tunnel. But it has no understanding of what are a tunnel, a ball or a wall. The model only figured the contingencies for a given scenario, with solutions being often superficial. It would only make minor changes, such as the position of the bar, wall, etc, to rend the system useless and shows its superficiality.”

In short, patterns extracted by deep learning are more superficial than they initially appear.

This holds for the lack of understanding of a tunnel, a wall, but the claims of lack of adaptability (change in-game setup) were mostly disproved by OpenAI’s DOTA in the Summer of 2018.²

Deep learning is not sufficiently transparent (aka black-box)

This is a common remark and criticism of deep learning. But how important is it? I have personally come across various opinions on the subject. As the author notes, it really depends on the sector. One might want to understand how a decision was picked, understand the extent to which there’s bias, especially if it could lead to important life/death decisions. Understanding how it works might lay bare those bias, and help us improve or debug it better.

Deep learning is not well integrated with prior knowledge

Marcus argues that the dominant approach is self-contained, isolated from potentially useful knowledge. There isn’t strong interest in integrating prior, well-established knowledge, such as how a tower falls and how the rules of physics work in deep learning systems. Another concern is ‘how’ this knowledge could be integrated. In DL systems, knowledge is typically the result of correlations between features. As noted by the author, those correlations are often opaque, and this knowledge is in opposition to quantified statements, such as ‘all men are mortal’.

Furthermore, problems that require common sense are not yet within reach for deep learning, and another complete set of tools might be required. The author cites research conducted with Ernie Davis³ (2015) and questions such as:

Who is taller, Prince William or his baby son Prince George?
Can you make a salad out of a polyester shirt?
If you stick a pin into a carrot, does it make a hole in the carrot or the pin?

To answer the above, humans will use integrated knowledge from a vast amount of disparate sources, far from deep learning’s approach.

Deep learning cannot inherently distinguish causation from correlation

Deep learning systems learn complex correlations but not causality. I wouldn’t be too hard on DL on this one, as many fellow humans struggle with the difference between correlation and causality.

As an example, Marcus uses a deep learning system that finds out kids get bigger as they learn more words. Yet that doesn’t mean that growing tall causes them to learn more words, nor that learning new word causes them to grow. Yet, as noted by the author, DL is not geared towards such challenges.

Deep learning is good as an approximation but answers not to be fully trusted

Image recognition is still improving, and the need to account of random or planned adversarial attacks remains. The author takes the example of dunes of sand mistaken for nudes, or the case of a yellow-black square mistaken for a school bus, or worse even of a defaced stop-sign mistaken for a speed limit sign.

Deep learning is difficult to engineer with

There is a high interest in technical debt. A deep learning system is trivial to set up for short term gains, but it’s another story to guarantee it will work in alternative circumstances with novel data.

The topic of technical debt is an exciting and crucial one. If you’re curious, I recommend you this article (I am a bit biased):

Are we done with Chatbots, Neural Networks, Self-driving cars?

Does it mean we should ignore those areas? No, but we might have to review our expectations for now. Chatbots can be deceptive for many reasons, one of them being the lack of short-term memory or context. Self-driving cars is another topic on which hype is slowly dying. In 2015, The Guardian predicted we’d be in “permanent backseat driver,” by 2020. In a 2016 article, Business Insider claimed “10 million self-driving cars will be on the road by 2020”. Other major car manufacturers also made similar impressive claims. Elon Musk had also forecasted Tesla would do it, by 2018. It turns out, self-driving cars are more complicated to bring to production that initially thought. Hopefully, after the hype dies down, work will continue and some breakthroughs will come later on. There’s been progress in many areas, as they are today, they definitely are great tools to support and enhance our works and our lives.

In his paper Marcus, also discussed deep learning’s dealing with hierarchical structure and open-ended inference. The author claims DL is still struggling with those two areas from the Natural Language Processing (NLP) scope.
Turing-award winning and pioneer in Machine Learning, Judea Pearl, discusses in his book The Book of Why: The New Science of Cause and Effect how understanding causality will revolutionize artificial intelligence.
Will Deep Learning be hit by computation limitations? Some MIT researchers think so, see here.