A lot of (misguided) excitement has been proclaimed regarding so-called GANs: Generative Adversarial Networks. The brilliant AI-knowledgeable media, including ‘social’ media, is awash with articles and posts on deepfakes and how AI is already “creating” novel videos, novel images, and even novel music and text. In fact, the famously touted GPT-3 is one such example of AI touted as getting closer to Artificial General Intelligence (AGI). It has even made the ‘respectable’ Guardian write an article proclaiming that AI can already write news stories. (I expect more ridiculous claims, by the way)
Be assured, and sleep well. Generative Neural Networks do not really generate anything new. They simply deliver meaningful combinations of the millions of examples they have memorized. How many combinations they can generate (that look like novel creations to the brilliant journalists in the mainstream media)? Well, let’s take a look at the formula of the number of combinations C that we can obtain from n objects taking r objects at a time:
Considering how large the training sample is in most NN models, n is a very large number. In GPT-3, it is in fact in the millions. Even in reasonable models of image recognition, n is at least in the order of 100’s of thousands. Now imagine what n! is. It is a very very very large number. Thus the number of combinations that can be generated is very large. Of course, a lot of these combinations will not be (will not look) intelligible (or comprehensible). And here’s the creative GAN architecture. What a GAN does is use tow networks. One to generate objects from noisy (or even random) data (images, text, video frame, … depending on the what is being ‘generated’), and one that ‘modulates’ what is being generated and makes sure it is a reasonable object consistent with the sample (training data). And, appropriately enough, one of the networks is called the generator while the other is called the discriminator. Both networks are important, but the discriminator is key here. It is the network that keeps “checking in on how the generator is doing” so as not to generate meaningless combinations. This is done by considering any generated image that is not consistent with the sample data as an adversarial example (thus the name of the network). Below is the high-level schematic of what a GAN looks like:
The (rather long) caption in the image above explains what so-called “generative” neural networks do. They do not really generate anything outside the space of the data they saw — and, in particular, they will not create new classes but a fuzzy combination of objects in the same class. They simply create some combination, and after m number of epochs that are modulated by real-life examples, some combination is accepted (when the loss between the combination and a real-life example is small enough!)
Research and experimentation have shown very clearly that NNs simply memorize massive amount of data and cannot deal with anything new outside of the sample data they processed. Recent research has even shown that you can in fact reverse engineer a NN to get text the network ‘saw’ and stored (memorized) verbatim! Our experiments with GPT-3 also show the same thing: the massive network simply memorized and stored 100’s of millions of patterns in the billions of parameters (weights) in the network.
However, you have to give it to the brilliant DL’erners out there. These generative models do not spit out any combination: they are good at knowing what goes with what. So the trillions of possible combinations are pruned in an intelligent manner and only reasonable combinations are produced. This, is smart. But to think the NN is “generating” anything new, you are deceived, to say the least.
Real generative models would generate mermaids form fish and women images, but don’t expect this anytime soon. NNs simply memorize data, and they can be novel in that space only — the space of data they saw as well as meaningful combinations of the data objects they saw!
If NNs just memorize data but can also somewhat deal with any combination of the data they processed, then what are their limitations? Well, in many situations they will do a reasonable job, especially in finite domains or in domains where the majority of examples in real-life are in a cluster that the NN has a good cover. However, in infinite domains, and especially language or planning, to name a few, no matter how massive is the coverage of the data (and all their meaningful combinations) the NN cannot be used in practice. It takes a few questions answered wrong, or a few accidents of autonomous vehicles (yes, this is a Planning problem!) or few medical diagnosis going wrong to put an end to the futile attempt at chasing infinity.
Terminology is important in science. Like “deep” in neural networks does not mean “deep thinking” — as some naïve journalists thought and were surprised to know it simply means a lot more layers than the NNs of the 1980’s had, the word “generative” in Generative Adversarial Networks simply means (somewhat meaningful) “combinations”.
And, as usual… keep AI’ing!
ONTOLOGIK — Medium