Deep learning versus logic: what’s at stake?

After posting last week on some notes regarding a workshop on deep learning meets logic, I came across this excellent article by Subbrao Kambhampathi. In broad strokes, he makes the following point (my interpretation): much of the recent emphasis in machine learning research has been on (big) data driven learning, with little regard to explicit knowledge, say in the form of symbols or constraints, and even treating the endeavour with scorn. He argues convincingly that this is ultimately a futile experience because there are many cases where leveraging available knowledge is effective.
There are four points that I think could be made in this context on logic vs learning, not necessarily entailed by the thrust of the article, but nonetheless relevant.
The first thing is whether this notion of effectiveness can be formalised: for example, can we rigourously show that to learn a logical theory capturing a complex piece of knowledge requires exponential space, that is, the network has to parse exponentially many examples before arriving at the theory? If this were the case then we could legitimately ask why bother learning if it’s already provided.
For example there is a classic thought experiment, called the Chinese Room, from the philosopher John Searle about how it is possible to not understand something yet achieve Turing-test level success. A combinatorial argument was given against this thought experiment recently. Perhaps we might arrive at something similar for the balance between tacit knowledge and explicit knowledge. (Indeed, in the context of PAC learning, there is reason to believe that learning arbitrary propositional formulas is intractable.)
The second point is that even if the logical theory can be learnt, and someone would argue that parsing exponential many examples is fine, what if the deductions that we get from this theory are unreliable? That is, either they are not consistent when queried repeatedly, or it is hard to analyze the axiomatic basis of the entailed sentences, etc. (In contrast, proposals like robust logics define a semantics to capture the quality possessed by the output of learning from noisy examples when formulated in a logic. Real-valued logics are another way out of this issue.)
The third point, as already noted in the article above, the tasks that deep learning approaches are tackling are largely ones for which are codification of explicit knowledge was not immediate, or infeasible as per current understanding. But by the same measure, when it comes to the codification of high-level reasoning as well as commonsense theories, there is very little progress in tackling this as a pure classification/prediction problem. Often what is attempted is a kind of hybrid approach that combines some sort of symbolic reasoning or search, together with statistical co-occurrences grabbed from large corpora. Here is where research that combines ontologies & graphs with learning are becoming prominent.
Then, there is a fourth point, which gets to the heart of the “religious” wars between the various communities, if it can be called that. The question we might ask ourselves is this: why don’t we accept any mishmash of approaches so long as it solves the problem? If this were the case, we would not be having any discussion on where and what can be accomplished by logic versus learning, and choose the best tool for the task without tears. (Indeed, this is what many “pragmatists”, such as those in the automated planning & robotics community, are doing anyway; more below.)
I believe there is a deep, almost aesthetic, reason that scientists attempt to neatly tie up solutions to problems, and I think this is why there is some tension on what can be captured by learning alone without relying on explicit knowledge. Likewise, many of the symbolic approaches in the 80s also attempted to completely specify things declaratively, without wanting to use adaptive learning. (Perhaps an analogy could be drawn to the mathematical foundations of physics, ie the theory of everything.)
From an aesthetic point of view, there is elegance in the following possibility. Suppose it were the case that once we conceptualise the right architecture for a neural network, not only would it perform object detection & classification as they do nowadays, but also that reasoning might “emerge” as a direct or indirect consequence. I can see why people might be working towards this, as the human brain seems to have such capabilities, although we do not really understand how. As far as I know, there is no evidence that on training a neural network, any semblance of general concepts and abstractions for time, space and causality naturally emerge (i.e. without explicit architectural / training regime tinkering) along with the axioms to reason deductively about them. Likewise, from an aesthetic point of view, there is elegance in the possibility that all the necessary cognitive functions needed for artificial autonomous agency could be fully accomplished in something like first-order logic. Progress on this front has not been promising where low-level perception is concerned, and perhaps virtually abandoned. There is still reason to believe, however, that linguistic and visual constructs that allow us to imagine new and unforeseen concepts need symbolic manipulation (cf. discussions on Winograd schemas).
All of this seems to suggest that there are tasks that are not immediately amenable to any one model (logic, deep learning, SVMs, whatever). Indeed, as AI continues to be deployed in more complex scenarios, hybrid solutions are likely needed. So many. are looking to combine different technologies. But from a fundamental-limit-of-what-each-approach-can-accomplish point of view: Are hybrid approaches the only possible way to realise artificial intelligence?
Let me finally note that even in the hybrid land, the matter of aesthetic quality comes into play: for example, is the system loosely coupled, or tightly coupled? Is there an overall semantics that binds the learner & reasoner? Are we looking at a robust logic-like approach, which captures the semantics of the logic for learning from noisy examples, a de-coupled approach where we have a classical or probabilistic logical theory with “neural” predicates, or a real-valued / vector embedding logical approach, or something else entirely? The approaches here I mention explicitly are not equivalent, and trade offs of opting for one versus some other might be application/context/objective dependent.
PS It’s worth noting that autonomous agency is now sometimes being referred to as artificial general intelligence. I understand that the qualifier “general” here is to indicate its unrestricted nature, but I’m of the impression that artificial intelligence as considered by Alan Turing and John McCarthy is unrestricted in this sense anyway.
Footer