Meaning is core to language because the meaning of a sentence determines the forms of words and phrases that are selected and vice versa. Or as I say: Form follows meaning®. But what is meaning?
In language, the word forms that we use to communicate with others follow the meaning of what we want to say and, just as importantly, the meaning of what we say is far deeper than the words we can use to say it. Therefore, meaning needs to be at the core of our language understanding systems, not word forms.
What is missing from data science today is meaning. When one data project completes another can begin immediately because the slice of meaning selected in data is always too narrow, but meaning allows ongoing generalization because it is rich content, not just labeled content. There can be many thousands of relations to a referent in a meaning layer, but data annotations may only capture a single feature for a particular purpose and unsupervised data is limited to the content of the source files. In contrast, meaning can continually add to the richness of the representation. That difference between data and meaning is a key differentiator intended for exploration in this article.
It is common to hear the question: “what do you mean by meaning” from the AI community. And the answer is only difficult because describing the part of linguistics that explains ‘linguistic meaning’ (the meaning of our language) needs to cover the meaning of “words, phrases, grammatical forms and sentences but not … the meanings of actions or phenomena[i].” It’s a long answer, but a basic part of the cognitive sciences that is well documented.
The research area is called semantics and many levels of meaning should be covered such as the ambiguous meaning of words and phrases taken in isolation, through to the meaning of an expression in a given context of utterance to communicate something. In Role and Reference Grammar[ii] (RRG), these three areas of meaning can also be thought of as roughly syntax, semantics and discourse pragmatics (covering context). The RRG linking algorithm maps vocabulary in sequence (phrases) to semantics in context and back again.
Today, I’ll look into how we can make use of semantics to store knowledge as a language-independent representation inside of context. Another time, I will show how the model is learnable based on standard building blocks. This isn’t theoretical, but my experience based on the Pat Inc. (Pat) system that was used to benchmark the meaning-only representation against the Facebook conversational tasks in 2017[iii] with 100% accuracy against the tests and the training data (and finding errors in the training data!).
Humans represent knowledge differently to today’s systems.
For example, humans continuously learn their language, while today’s systems seek to store all possible words (signs/forms) in advance.
We also rely on what is called context of utterance. The science of semantics tells us the meaning of what people store. J. R. Firth called it “context of situation.[iv]” People not only store the information (facts/opinions), but the full context that includes who said what, who heard it, when and where it was said, and other facts known at the time known as immediate and general common ground.
Why is context so complex? Fundamentally a brain only receives sensory information and it recognizes known things. Patom theory (my brain theory from the 1990s) explains the phenomenon as “the specific defines the general.”[xii] We see the effects in our language with the distinction between proper names (specific) and general referents. Languages are small in terms of the general words we use, and immense in terms of the specific ones. English includes all the names of people in the world, companies, streets, suburbs and other named things. Those are billions of words and phrases. But the language itself to manipulate those referents is much smaller, something of the order of 10,000 uninflected signs (word senses).
Some hypotheticals make the point about how strong context is.
First hypothetical: “Donald Trump said ‘Joe Biden is demented.’” Here a fact is asserted. We can store it: “Joe Biden is demented.” Is this a fact? Why or why not? Some will say to never believe Trump. Some will say Biden is observably a genius. A knowledge representation needs to include the information that people use to determine its validity.
Let’s try another hypothetical. A. Dufus wrote: “COVID-19 is cured by injecting mosquitoes into your bloodstream.” A machine could accept this as a fact, especially if many others quote this. Now if we relied on AI using a (bad) knowledge repository — instead of doctors — for pandemic diagnosis, a machine could tell us this cure for the virus with high certainty. But what’s more credible? A machine with millions of consistent facts or a single article by the Center for Disease Control in the US? Some will say both are wrong. Some will say both are right. Some will select one as fact and the other as a conspiracy theory.
The point is human context includes its source and as we move into storing the world’s knowledge (context) in scale, we need source information to independently decide its validity now and in the future.
Context can be exposed dynamically. E.g. How do you know a Porsche is a fast car? You saw it, read about it, drove it, …? In order to emulate human knowledge, we need to store source information such as direct and indirect experiences — we need real context.
In summary: People can tell you the source of their knowledge and also the full context as claimed by linguists.
Patom theory explains brains as pattern-matching machines that recognize context — the set of active elements a brain matches at a point in time and their change over time (i.e. just sets and sequences). Context is the set of specific patterns, but as the brain’s representation is hierarchical in Patom theory, the components that make up specific objects can also be recognized as (sub)patterns.
That’s why we learn a language easily. We learn the general patterns underlying the specific ones, provided we receive comprehensible input.
In other words, recognition of general patterns follows directly from matching specific patterns once there is an underlying common pattern. “The specific defines the general.” It is this brain feature of finding patterns in other patterns in a hierarchy that enables a brain to learn a language. It explains why children’s brains learn words ahead of phrases.
Semiotics explains language as the connection of (a) signs (form[v]) to (b) meaning (interpretation) to (c) objects. C.S. Peirce’s model includes an interpretant, unlike Saussure, which comes in handy for languages because it incorporates the potential ambiguity of a word’s meaning.
Language learning relies on comprehensible input. You need to be able to connect the sign (the word sound) to its meaning (the referent or predicate). That’s why children don’t learn languages by scanning foreign books — the meaning is needed before learning by reading can begin.
How the model shown today can learn end-to-end will be explained later in this series based on my experience of building a working system that is aligned with documented linguistic models.
Language theory based on syntax alone is complex because the building blocks of such systems tend to be highly ambiguous. Meaning-based systems are far simpler, built only on the two semantic elements available — referents (things) and predicates (relations).
The syntactic model takes the meaningful building blocks and combines them with their function in the sentence. This is ambiguous since a predicate becomes a noun (“the destruction of the city”) and a verb (“the city was destroyed”) and a referent becomes a noun (“the spatula”) and a verb (Can you spatula me some pancakes” — i.e. use a spatula to give me pancakes). The syntax-first model (typically called parsing) then compounds the problem by excluding meaning and context from the recognition of the sentence, taking the myriad of out-of-context phrases that could be valid and checking all other out-of-context possibilities in combination. This results in the unsolvable combinatorial explosion and has led to statistical techniques as the combinations rapidly exceed the capacity of humans to write rules (the push-in pop-out problem). Today, as a result, statistical approximations dominate NLU.
The reason a brain is best to use semantics to ground language rather than syntax comes back to learning with comprehensible input. Brains can match known patterns. There is an observable difference between a referent (thing) and an activity or state (predicate/relation). Different parts of the brain match them, in fact, with a separate, common area recognizing the sound of the words in the temporal lobe.
To determine the noun/verb distinction, as has been central to many linguistic models since the 1930s, both the word and phrase would need to be learned together, which Ockham’s razor rejects because it is more difficult to combine entities (meaning plus word order) when they need not be combined because meaning alone is enough. And children’s brains learn single words first, not word sequences.
By recognizing the difference between a thing and relationships, this semantic basis for language enables us to emulate it in a knowledge representation because most of the details in representing referents and predicates are well-known linguistic models, which in my experience, need only minor enhancements.
To build up a sentence with the RRG layered model, the behavior of the predicate needs to be understood[vi]. Central to RRG is the predicate’s class, which defines the number of roles, causation, the Aktionsart class, the polarity, and whether it is a base state or activity.
The expression: “predicates determine their arguments” comes from linguistics. It refers to the fact that a predicate (noun/verb) constrains the meaning of its arguments (referents). Again, in “the destruction of the city” the predicate ‘destroy’ constrains ‘the city’ to the type of thing that can be destroyed. The same constraint is in “the city was destroyed.”
This concept of semantic constraints is well-known as a selectional restriction[vii] and the concepts of referent categories equally so, such as the associations in WordNet like hypernyms (is-a) and meronyms (has-a). To support qualia structure, the ability to use what a referent is for, comes from the reverse of the restrictions: what is associated by the predicate. We eat food and fish can be food. Therefore, we can eat fish, and therefore fish is for eating. If John begins a fish, qualia structure says he begins to eat a fish as shown in the meaning matcher output below. There are other interpretations, but this shows the principle in a meaning-based system.
This re-use of categories as restrictions and qualia structure is a simple application to combine both ideas. I have used them with Pat to limit the results of semantic set validation and to generalize understanding.
Generalizing things is well documented. The types of things and their possessives can be seen readily, but the generalization of predicates seems poorly explored. Let’s see how representation constrains meaning and enables generalization.
Referent Generalization
A thing’s possessions generalize easily in language. When you “park out front” you mean you park your car (or bike or whatever) our front. Your car is your possession that can be used interchangeably at times. When “Pat Inc. hires ten linguists” it is people in the company that do the hiring (i.e. companies have’ staff), not the entity company itself. These are known as ‘qualia structure’ in models like Pustejovsky’s Lexical Semantics.
We dynamically learn language throughout our lives. Speakers help listeners by associating the things to enable understanding. A book for a three-year-old, for example, starts: “Have you sailed to the island of Bum Bum Ba Loo?[viii]” which means even a three-year-old must be able to recognize a new phrase “Bum Bum Ba Loo” and associate it with an island, something that must also be known already. Language exposes the category of referent with the simple question: “What is Bum Bum Ba Loo?” “An island” would be the minimum correct response where context is stored: i.e. “Bum Bum Ba Loo is an island,” and an island is a thing (referent).
Without a mechanism to learn new words and phrases on the fly, like three-year-old human children, systems will remain artificial.
Predicate Generalization
Predicates define their arguments: referents. Predicates aren’t simple categories like things (vehicles, people, animals) but complex relationships that can carry many meanings. ‘Eating’ for example, can mean to ‘bite’, ‘remove a circular piece’, ‘chew’ and ‘swallow’. ‘Giving’ can mean an exchange of possession between parties.
You can see the set of meanings through generalization. “John ate the tuna” means he “swallowed the tuna” even though swallow isn’t mentioned.
The Facebook tests we benchmarked against in 2017 included some generalization examples, predicates like ‘carrying’, ‘giving’, ‘taking’, ‘having’, ‘picking up’ and so on. The predicates have different relations as seen in their semantic relationship (also called a logical structure), but there is a common meaning element, the language-independent meaning represented as have’. have’ is written in a metalanguage. It isn’t an English word, but the representation of a state predicate that requires two roles (a possessor and the thing it possesses). It could be any kind of unique token, but that would make it harder for linguists to debug the system.
This extract from Table 3 from my arXiv article[iii] shows the subtle differences in meaning between receive (not causative) and take (causative). It also explains why you can get something from someone (3-roles) but not have something from someone (2-roles).