Artificial General Intelligence — The path to Superintelligence
Introduction
What we usually think of as Artificial Intelligence (AI) today, when we see human-like robots and holograms in our fiction, talking and acting like real people and having human-level or even superhuman intelligence and capabilities, is actually called Artificial General Intelligence (AGI), and it does NOT exist anywhere on earth yet. What we actually have for AI today is much simpler and much more narrow Deep Learning (DL) that can only do some very specific tasks better than people. It has fundamental limitations that will not allow it to become AGI, so if that is our goal, we need to innovate and come up with better networks and better methods for shaping them into an artificial brain.
Classically human tasks such as writing, speaking, making sense of new ideas and situations remain far outside the realm of Deep Learning AI. For example, DL uses deep ‘neural’ networks (DNNs) that really have very little in common with biological neurons. With DNNs, you have to label your training data with the expected outputs ahead of time, do static training with it, and DNNs cannot generally learn dynamically in deployment. They cannot handle temporal-spatial operations (beyond a very simple level with RNNs); The speech interfaces today using DL are far from being conversational, let alone at a human level; In fact DNNs, CNNs, RNNs, … are all hacks made so that AI researchers could actually run something with ‘neural nets’ on computers in the 80s, leaving them simplified shadows of biological neural functioning; No matter how large they scale, they will never result in an AGI
Summary — How can we build an AGI?
To build an AGI, we need better neural networks, with more powerful processing in both space and time and the ability to form complex circuits with them, including feedback. With these, we can build bidirectional neural network autoencoders that take sensory input data and encode it to compact engrams with the unique input data, keeping the common data in the autoencoder. This allows us to process all the sensory inputs — vision, speech, and many others into consolidated, usable chunks of data called engrams, stored to short-term memory.
Now to store it to long term memory, we process a set of input engrams to reside in a multi-layered, hierarchical, fragmented long-term memory. First we sort the engrams into clusters based on the most important information axis, then autoencode those clusters further with bidirectional networks to create engrams that highlight the next most important information, and so on. At each layer, the bidirectional autoencoder is like a sieve, straining out the common data or features in the cluster, leaving the unique identifying information in each engram, allowing them to then be sorted on the next most important identifying information. Our AI basically divides the world it perceives by distinguishing features, getting more specific as it goes down each level, with the lowest level engram containing the key for how to reconstruct the engram from the features stored in the hierarchy. This leaves it with a differential, non-local, distributed, Hierarchical Fragmented Memory (HFM), containing an abstracted model of the world, similar to how human memory is thought to work.
It also encodes language (spoken and written) along with the input information, turning language into a skeleton embedded in the HFM engrams, used to reference the data with, to mold it with, with the HFM give structure and meaning to the language.
When our AI wants to reconstruct a memory (or create a prediction), it works from the bottom up, using language or other keys to select the elements it wants to propagate upwards, re-creating scenes, events, and people, or creating imagined events and people from the fragments by controlling how it traverses upwards. It is this foundation that all of the rest of our design is based on, as once we can re-create past events and imagine new events, we have the ability to predict the future, and plan possible scenarios, doing cognition and problem solving.
Here is a video that summarizes the ideas in this article, and, if you want to skip the rest of the article, there is a second video at the end that is longer that gives all the tech details in the article.
Chapter 1 — What is Human Intelligence
To pass the threshold of human intelligence, and become an artificial general intelligence requires an AI to have the ability to see, hear, and experience its environment. It needs to be able to learn that environment, to organize it’s memory non-locally and store abstract concepts in a distributed architecture so it can model it’s environment, and people in it. It needs to be able speak conversationally and interact verbally like a human, and be able to understand the experiences, events, and concepts behind the words and sentences of language so it can compose language at a human level. It needs to be able solve all the problems that a human can, using flexible memory recall, analogy, metaphor, imagination, intuition, logic and deduction from sparse information. It needs to be able at the tasks and jobs humans can and express the results in human language in order to be able to do those tasks and professions as well as or better than a human.
We probably will never be able to re-create a human brain, neuron by neuron, because the biological neurons and their silicon analog are just too different. The human brain also underwent a very complicated evolution starting 1 billion years ago from the first multi-cellular animals with a couple neurons, through the Cambrian explosion where eyes, ears and other sensory systems, motor systems, and intelligence exploded in an arms race (along with armor, teeth, and claws). Evolution of brains then followed the needs of fish, reptiles, dinosaurs, mammals, and finally up the hominids lineage about 5–10 million years ago.
Much of the older parts of the human brain were evolved for the first billion years of violence and competition, not the last thousands of years of human civilization, so in many ways out brain is maladapted for our modern life in the information age, and not very efficient at many of the tasks we use it for in advanced professions like law, medicine, finance, and administration. A synthetic brain, focused on doing these tasks optimally can probably end up doing them much better, so we do not seek to re-create the biological human brain, but to imbue ours with the core functionality that makes the human brain so flexible, adaptable and powerful, then augment that with CS database and computing capabilities to take it far beyond human.
What we need to start on the road to AGI are better neural networks to start with. The human brain is a very sophisticated bio-chemical-electrical computer with around 100 billion neurons and 100 trillion connections (and synapses) between them. I will describe two decades of neuroscience in the next two paragraphs, but here are two good videos about the biological Neuron and Synapse from ‘2-Minute Neuroscience’ on YouTube that will also help.
Each neuron takes in spikes of electrical charge from its dendrites and performs a very complicated integration in time and space, resulting in the charge accumulating in the neuron and (once exceeding an action potential) causing the neuron to fire spikes of electricity out along its axon, moving in time and space as that axon branches and re-amplifies the signal, carrying it to thousands of synapses, where it is absorbed by each synapse. This process causes neurotransmitters to be emitted into the synaptic cleft, where they are chemically integrated (with ambient neurochemistry contributing). These neurotransmitters migrate across the cleft to the post-synaptic side, where their accumulation in various receptors eventually cause the post-synaptic side to fire a spike down along the dendrite to the next neuron. When two connected neurons fire sequentially within a certain time, the synapse between them becomes more sensitive or potentiated, and then fires more easily. We call this Hebbian learning, which is constantly occurring as we move around and interact with our environment.
The brain is organized into cortices for processing sensory inputs, motor control, language understanding, speaking, cognition, planning, and logic. Each of these cortices evolved to have networks with very sophisticated space and time signal processing, including feedback loops and bidirectional networks, so visual input is processed into abstractions or ‘thoughts’ by one directional network, and then those thoughts are processed back out to a recreation of the expected visual representation by another, complementary network in the opposite direction, and they are fed back into each other throughout. Miguel Nicolelis is one of the top neuroscientists to measure and study this bidirectionality of the sensory cortices.
For an example, picture a ‘fire truck’ with your eyes closed and you will see the feedback network of your visual cortex at work, allowing you to visualize the ‘thought’ of a fire truck into an image of one. You could probably even draw it if you wanted. Try looking at clouds, and you will see shapes that your brain is feeding back to your vision as thoughts of what to look for and to see. Visualize shapes and objects in a dark room when you are sleepy, and you will be able to make them take form, with your eyes open.
These feedback loops not only allow us to selectively focus our senses, but also train our sensory cortices to encode the information from our senses into compact ‘thoughts’ or Engrams that are stored in the hippocampus short term memory. Each sensory cortex has the ability to decode them again and to provide a perceptual filter by comparing what we are seeing to what we expect to see, so our visual cortex can focus on what we are looking for and screen the rest out as we stated in the previous paragraph.
The frontal and pre-frontal cortex are thought to have tighter, more specialized feedback loops that can store state (short-term memory), operate on it, and perform logic and planning at the macroscale. All our cortices (and brain) work together and can learn associatively and store long-term memories by Hebbian learning, with the hippocampus being a central controller for memory, planning, and prediction.
Human long-term memory is less well known. We do know that it is non-local, as injuries to specific areas of the brain don’t remove specific memories, even a hemispherectomy which removes half the brain. Rather, any given memory appears to be distributed through the brain, stored like a hologram or fractal, spread out over a wide area with thin slices. We know that global injury to the brain, like Alzheimer’s — causes a progressive global loss of all memories, which all degrade together, but no structure in the brain seems to contribute more to this long-term memory loss than another.
However, specific injury to the hippocampus causes the inability to transfer memory between short term and long-term memory. Coincidentally, it also causes the inability to predict and plan and other cognitive deficits, showing that all these processes are similar. This area is the specialty of prominent memory neuroscientist, Eleanor Maguire, who states that the reason for memory in the brain is not to recall an accurate record of the past, but to predict the future and reconstruct the past from the scenes and events we experienced, using the same stored information and process in the brain that we use to look into the future to predict what will happen, or to plan what to do. Therefore the underlying storage of human memories must be structured in an abstracted representation in such a way that memories can be reconstructed from some for the purpose at hand, be it reconstructing the past, predicting the future, planning, or imagining stories and narratives — all hallmarks of human intelligence.
Replicating all of the brain’s capabilities seems daunting when seen through the tools of deep learning — image recognition, vision, speech, natural language understanding, written composition, solving mazes, playing games, planning, problem solving, creativity, imagination, because deep learning is using single-purpose components that cannot generalize. Each of the DNN/RNN tools is a one-of, a specialization for a specific task, that cannot generalize, and there is no way we can specialize and combine them all to accomplish all these tasks.
But, the human brain is simpler, more elegant, using fewer, more powerful, general-purpose building blocks — the biological neuron, and connecting them by using the instructions of a mere 8000 genes, so nature has, through a billion years of evolution, come up with an elegant and easy to specify architecture for the brain and its neural network structures that is able to solve all the problems we met with during this evolution. We are going to start by just copying as much as we can, then using evolution to solve the harder design problems.
For our basic unit of synthetic neural computing, we will use spiking neural networks (SNNs), which model neurons as discrete computational units that work much more like biological neurons, fundamentally computing in the time domain, sending signals to travel before neurons, approximating them with simple models like Izhikevich or more complex ones like Hodgkin-Huxley (Nobel Prize 1953).
However, to date, application of spiking neural networks has remained difficult, as finding a way to train them to do specific tasks has remained elusive. Although Hebbian learning functions in these networks, there has not been a way to shape them so we can train them to learn to do specific tasks. Backpropagation (used in DNNs) does not work because all these spiking signals are one-way in time and are emitted, absorbed and integrated in operations that are non-reversible.
We need a more flexible connectome or network connection structure to train spiking neural networks. While DNNs only allow ‘neurons’ to connect to the next layer, connections in the visual cortex can go forward many layers, and even backwards, to form feedback loops. When two SNNs with complementary function and opposite signal direction are organized into a feedback loop like this, Hebbian learning now helps train them to become an autoencoder, that is able to encode spatial-temporal inputs such as video, sound or other sensors, and reduce them to a compact machine representation and then decode that representation into the original input and together provide feedback to train this process. We called this Bidirectional Interleaved Complementary Hierarchical Neural Networks or BICHNN.
In my previous articles on AGI, I discuss how to create these new Bidirectional Spiking Neural Nets that allow two complementary SNNs to feedback on each other, and train each other — to autoencode their sensory input into a compact Engram representation that contains all the distinctive information about that sensory input in space and time. By mapping it back through the complementary network, we can re-create the original sensory input, for modalities such as vision, audio, speech, and other data.
This encryption of sensory input to compact Engrams is vital for it to be later manipulated efficiently using today’s machine learning and deep learning techniques, so we can integrate our BICHNN interfaces with them for near-term, commercial applications, as well as integrating it with our own novel AI architectures later.
All of the ORBAI neural networks AI systems are evolved by genetic algorithms, by cross-breeding variations of each AI component, testing them each vs an evaluation suite, picking the top 10%, cross-breeding them, testing them, and continuing that cycle until they get good enough to deploy. In the previous example, we can continue to evolve them in-place, taking the existing model, running it on the data to evolve a better one, then partially deploying it to further evolve with some real-life testing, fully deploying the improved version to serve millions of people once it is validated. This could be done weekly or even daily if the training and evolution cycles are fast enough.
Evolution and genetic algorithms are like making Jello. You cannot shape the Jello directly (neural nets in this case), so you make a mold (evaluation criteria) that shape it as it undergoes evolution. You decide on a design and functionality (encode video or predict engram sequences) and then you design a series of tests that evaluate how well your evolved neural net meets the design criteria, and score each neural net accordingly, only keeping the top 10% each generation of training and testing to be cross-bred and evolved in the next generation. Provided your genetic algorithms and process are designed well, the network will evolve to fill out the evaluation criteria and design just like Jello fills the mold.
We use a compact genome of a few KB to fully describe the final connectome of each neural network (by an algorithmic expansion). This makes genetic algorithms possible, because we can crossbreed these genomes knowing that they map smoothly and deterministically to a much large connectome and the system will converge through generations of genetic algorithms because the basis defining it — the genome — is compact and fully descriptive.
With that, we can assemble all the components we made into an artificial brain, which we represent with a macro-genome that describes what layers, cortices and components there are and how they are connected. We can apply training and evolve both the macrostructure of the cortices and internal cortex structures to best perform for the desired task — flying a drone, driving a car, providing a speech interface for a bot.
Now we look beyond to building more advanced perceptual systems for Artificial Intelligence. If we simply combine all the input modalities into one representation of the ‘moment’, then we have an Engram that, when represented in time, can record all aspects of the events that happened, and later play back this series of Engrams.
In a practical implementation, if the output of each sensory BICHNN is a small sample of the neuron activity in the last layer(s), then we can stack the sensory inputs in layers to make a 3D engram that plays back in time. It could have layers for the senses (vision, hearing, touch, …) and from internal brain state, like mood, etc. encoded into a continuous recording of the input.
In reality, neuroscientists have found that the ‘encoded’ signals or engrams studied in a real human brain are 3D patterns moving in time, which encode the inputs from the senses (and the outputs from our motor cortex) at the most compact level, so without the full 3D, time-dependent representation, the encoding is incomplete. This means that we most likely have to encode our engrams similarly, as a time recording of the intensity of neuron charges inside a cube of neurons, or an encoding with enough time-space information stored. Now we have our basic unit of recorded, short-term memory, our 4-D Engram.
Two clustering axes are in space — X, Y, and can be used with multi-resolution Gaussian sampling and clustering, to segment vision input, identifying what is in each region and processing it accordingly.
Another clustering axis after initial encoding is in time, that then allows us to do further encoding on segments at different temporal resolutions then classify them on multiple different axes.
Now we have a process that could encode and record the input from the senses into memory Engram snippets, and then cluster, index, group, and store them in a manner that they could be retrieved on demand, and even played back through the feedback channels of the BICHNN to re-create the sensory inputs from that moment.
These snippets can be stored and organized not only temporally but clustered by the content of the memory snippets and the context, allowing our artificial brain to naturally organize them according to how it interprets them, and to learn to access them and use them in many different ways. In the next section, we propose a method for doing a distributed, non-local long-term memory by alternating encoding and classification to build a feature-based stratified memory.
Basically, a BICHNN autoencoder is a strainer, that separates all the common properties about a set of data items from the unique properties of each item. The unique and distinguishing properties of each item are encoded into the compact engram, while the common properties to all the data set are retained in the autoencoder neural network. The autoencoder automatically learns which features are common and which are unique while observing and learning, just as humans notice novel properties, and tend to focus on them.
When we use BICHNN autoencoders hierarchically, say to summarize the movie ‘Titanic’ into a few sentences, we don’t want to just average, or mip-map the data about the plot points at each level of the hierarchy, or for we would get ‘a bunch of people on a ship, mostly screaming towards the end’. Instead we want to use the property of the BICHNN autoencoders to pick out the most important and differentiated plot points and only propagate them up to get: “Jack Dawson and Rose DeWitter fall in love on the Titanic’s maiden voyage. The ship sinks, and he dies, but she lives to tell their story”.
Long Term Memory
Most of what we have talked about so far is encoding sensory experiences moment by moment to short-term memory, like the brain does by combining sensory engrams and storing them sequentially in the hippocampus. They can be recalled, and some cognitive operations applied to them, but to be truly useful, we have to structure all our memories in such a way that they are non-local and holographic and so that simply by traversing them in a certain way, we can use imagination to imagine novel ‘thoughts’ and we can do cognitive operations using the structure of memory such as imagination, prediction, problem solving.
For example, by hierarchically encoding the data for the new memories in layers, with the first pass encoding video to engrams that are then clustered along a principle axis in which they differ most, then we repeat this sequence of encoding those clusters of engrams further on the next most significant axis to further stratify the elements in the videos, encoding and clustering ever deeper differentiating them from one another until each cluster has one engram and cannot be encoded any further, and represents the smallest piece of neural information that can uniquely identify this video. Our AI basically divides the world it perceives by distinguishing features at each level, each branch in the hierarchy, getting more specific as it goes down each level, with the lowest level engram containing the key for how to reconstruct the engram from the features stored in the hierarchy. We also have a deep hierarchical encoding of all the possible features in the videos we have viewed, allowing those features to be used to construct novel snippets and videos that could be used in creating new snippets and videos for imagination, prediction of the future.
In the more general description of this hierarchical, clustered, holographic engram memory, we can describe a simple process of encoding all inputs to engrams, clustering those engrams along different axes, encoding all the clustered engrams (on each axis they cluster in) again to other engrams, then repeating that clustering and encoding process until we have engrams that cannot be clustered further and are singular information, with the lowest level engram containing the key for how to reconstruct the engram from the features stored in the hierarchy.
This mapping of short-term memory to long term memory in humans is thought to be done while we sleep, and while we dream. In our model, we require our input information to be batched up so that it can be properly clustered and encoded in groups as it is inserted into long-term memory, so our memory structure and process would seem to mirror this. If we encode information, then re-create it and fix it in memory by reinforcement, that would mimic dreaming — and produce a visible / audible output from the memory during the process when decoded. This is interesting and could show we are on the right track. Perhaps all animals with brains require sleep simply because moving memories from short to long-term causes dreaming, and dreaming while awake causes hallucinations, and is maladaptive, so sleep is necessary for the transfer.
An example is processing faces. We encode the pictures of faces using the process above. Then we apply alternating layers of autoencoding and clustering to keep sorting those faces and encoding them by implicit features that could be eye color, hair style, hair color, nose shape, and other features (implicitly determined by the layers of autoencoding, and with bins for different classes of features overlapping) — to create a facial recognition system that just by looking at people, autoencodes their face and its features and can assign the associated name that was heard when they were introduced — to that person’s face. Later when we meet a new person, the memory structure and autoencoders are already there to encode them quickly and compactly.
Each lowest engram completely encodes that person, along with their ‘name’, providing the most compact neural representation of ‘them’ that allows our AI (by traversing up the hierarchy) to reconstitute their face and features from this hierarchical engram memory, and even allows us to imagine different features on them, like putting blond hair on Rick or compositing their features with another person. This lowest engram is colloquially known as the ‘grandmother neuron’ and this structure shows there is a secret to the human brain’s memory -there is NO explicit picture of the person stored in neural memory — and there is no actual recall of explicit memory images and dialog, or events by the brain. The recall process instead synthesizes a representation from abstracted memory structures to recreate a past event or description of a person or to imagine or predict a future event, all in the same manner.
We can also do this with language, composing coherent texts, e-mails or documents, and speech, or with the senses of a robot, storing and interpreting their world as they explore it. Or, we can do it with the inputs of a massive AGI watching all cell phone cameras in the world simultaneously — constantly encoding, storing and learning to understand the world of all humans and predict it.
There is a lot more to this than meets the eye, as ‘remembering’, imagining, and predicting are all actually the same reconstruction process, the product of traversing hierarchically encoded abstract memory engrams to construct a representation of a person, place or event, this suggests that much of human decision making and planning could also be driven by these type of traversals. This fact that these processes are the same has been very well documented by memory neuroscientists (Eleanor Maguire as mentioned earlier).
Operations on Engrams
Now that we have stored all the information about what the AI has sensed and experienced into an encoded hierarchical memory, we need to have the ability to recall it and do operations with it. In the examples of the generic data and the faces that we encoded, we stored the ‘name’ or word associated with that lowest-level engram. We would not do this as explicit labelling, but would pass the language down the hierarchical memory to tag the lowest level engram, and in in between levels, like if we had the phrase ‘Becky is a blonde’ with the picture of her, we would label the lowest level engram, and the hierarchy layer where hair color gets clustered and divided, doing so implicitly with all the labels of the faces contributing
To do reconstruction of data stored in the hierarchical engram memory, we need traverse upwards from the ‘names’ and ‘labels’ and the lowest-level engrams. And to do so, we need to add some information to each engram when it is encoded: the label, the encoder used and the cluster it was assigned to before encoding. These two pieces of information allow us to traverse back up from the lowest-level engram to the top layer to recreate the data, using the reverse process — decode, locate cluster encoder, decode,…
This can re-create memories of a clip, but if we want to use imagination, and deviate from the memory, we can branch as we go up the fragmented memory hierarchy. At some points, several clusters overlapped and got encoded into a cluster on the next level, and now, going back up, we can pick a branch that goes to either cluster, generating one or the other clip at the top, or both, then picking which we want to use in the chain of events.
Now we will look at what operations and processes we can do with these Engrams to make our AI more intelligent once we have re-constructed them at the highest level from our Hierarchical Fragmented Memory. By now indexing these Engrams chronologically, we can replay them, feeding them backwards through the sensory cortices to recreate the sensory, visual, and audio experience they encode. For simplicity, we will describe a ‘predictor’ module we can train on sequences of Engrams so we can (given a sequence of engrams from sensory input or memory) have the predictor constantly ‘guessing’ what will come next. Later we will go into how this would actually be implemented directly on our hierarchical fragmented memory architecture, with the predictor being implicit in the architecture.
Prediction and Planning using Hierarchical Engram Memory
A less sophisticated deep learning predictor would be a module that predicts sequences after being taught on all the past sequences of the first level of abstracted representation of Engrams, so that it begins to predict what will generally come next in the sequence. This is much less than our full functionality we will describe later, but is useful, as anticipation of events is a useful capability in everything from a video decompression engine to a conversational speech interface — to predict what will be said next.
We could play back what engram the predictor is ‘guessing’ all the way up back through the feedback of the BICHNN sensory cortices so we can mix that feedback with the input, to give the sensory systems selective perception, so they can anticipate and filter what to look or listen for in the sensory input. This is basically how our human selective attention works, by having the brain’s higher levels feed-back an engram to the visual and other sensory cortices to focus them on what the brain anticipates seeing or is looking for.
If we now have everything we have experienced as a set of Engrams, we need to figure out how to do useful cognition and planning with that information. The simple predictor pipeline model gives us some directions to explore. For instance, what if we add imagination or dreaming to it? Say that a predictor pipeline has the ability to predict what will happen next, but can only accurately look a short time ahead, like one or two ‘Engrams’ in time. If we allow it to just keep predicting forward, using the last prediction as a starting point, it can create meandering, but self-consistent dreams of an AI. That seems like a good direction to start moving to replicate human creativity, as our capacity to dream and imagine is at the core of it.
Now we give it ‘imagination’, so that the new pipeline not only predicts what is coming next but can also look ahead and build a chain of ‘Engrams’ that are all predicted from a known point. Because our predictor is trained on actual memories, which tend to follow a certain sequence, or flow, the basic temporal structure of the predicted moments should be consistent, and not just meander. In psychology, there are even specific tests in which a person is given a set of cards with scenes on them, then is asked to arrange them in the sequence the events depicted on the cards occurred in. We humans are surprisingly adept at this task of organizing events chronologically by guessing the order, even when we are looking at novel situations and sequences of events.
Our AI ‘imagination’ starts at the present experienced memory (or from a playback of memories), and predicts the next Engram, then moves the whole pipeline one step into the future, treating the last-guessed Engram as ‘present’, and just keeping on guessing into the future. Soon the future predictions are only indirectly based on the previous experienced reality of actual sensory moments to become more of a fictional (but consistent) narrative that drifts in its own directions. This could quite credibly be called dreaming if we play back this fictional narrative of Engrams through the feedback of the BICHNN sensory cortices, and we could watch the creative, Now, if we made the ‘dreaming’ process non-deterministic, so that if we re-run it multiple times from the same starting point, the dream would take different paths and create a different chain of Engrams, and explore different imaginative narratives.
Now we are cooking with gas. If we have a way to allow each dream to run for a certain amount of time, then to ‘score’ each dream according to a goal criteria, and then to re-record the ‘successful’ dreams into memory, and feed them back in to train the predictor, in a way such that those sequences of memories are now more strongly linked. This will, over time, allow us to steer our dreaming AI in directions that are most productive, measured relative to our goals.
In reality, we have shown earlier, there need not be any explicit predictor pipeline, as traversal of the hierarchical, distributed, holographic, engram data structure is implicitly a predictor in its design, and tracing through it and re-constructing memories, predictions, plans, or dreams is the same operation. The quality of the learned data, and the complexity and fidelity of the engram hierarchy influences the quality of the predictions more than the traversal algorithms.
Because memories of sensory input are recorded into sequential engrams in short term memory, and then batches of them are transferred to long-term memory, the main feature axis in long term memory is time, so that are memories are chronologically organized. Knowing this, when we are reconstructing a predictive engram, we can traverse the Hierachical Fractal Memory in a ‘diagonal’ with a time offset to start with the current moment, then arrive at a future moment, or do cycles and iterations with past and present moments then cycles with future moments. Either way, the end product is a chain of imaginary engrams describing a fictional narrative.
We now have a concrete start to doing cognition and planning, and the even the beginning of prescience or being able to predict the future better, with a longer look-ahead capability than humans. That is the beginning of something very powerful in AI called an ‘Oracle’, especially when we can now enhance it to be superhuman to scales in width of data and depth in time that it can run predictions through, and how accurate they are.
Again, we are just assuming a simple deep-learning style predictor system using the first level of engrams to train on the sequences of them to predict a few steps into the future, but we already see that there is a lot of potential. Let’s explore a bit further using this model.
Dreaming is a key component of memory consolidation, and allowing our brain to explore creative paths unfettered — to reinforce relevant memories and narratives, which is a core component of human planning and problem solving. However, during waking hours, most of us do more than dream randomly to come up with a plan. When we daydream, or plan while conscious, we do a mix of imagining short sequences of narrative, then we apply critical evaluation and cognition to the direction the narrative is going to determine if it is going towards our goals, or if we need to steer it in another direction.
This is harder to emulate with AI, because it presumes we can think backwards from the final goal to come up with interim goals, or even keep our narrative in a range at each ‘checkpoint’ such that it is always converging towards our final goal, if we can even define that final goal or the ranges at each checkpoint.
We could simply use the predictor in reverse, wiring it up so that it looks for the reverse chronological ordering of memories, starting at a goal state and going backwards. This presumes we can have a known goal sequence of memories long enough to ‘prime’ the reverse predictor.
Another conceptual approach is that we can start from both ends and run a predictor backwards from the goal while we predict forward from our known state, and then have a means of measuring how ‘close’ they are when they meet in the middle, and, using that metric, keep trying different narratives backwards and forwards till they converge to the best solution. It could be an optimization technique used to augment forward dreaming.
All these techniques — once they converge on a solution resulting in a useful narrative, could even be useful for solving later problems other than the direct problem we are solving now, as many narrative solutions could generalize to different problems. Over time, the AI could build up a library of ‘narratives’ that it can start with when exploring solutions, to speed the convergence time. This is actually how many human professional’s work. Early in their career, they learn a few strategies and sequences of things to do for a few sets of circumstances, and then just keep using them later. For senior attorneys, these patterns of behavior, and how they go about litigating in a case can be so rigid that they cannot adapt to new circumstances at all. I did a lot of ‘testing’ on senior and junior attorneys in 2020 to confirm this, and it is startling not only how predictable they are, but also how inflexible and unable to adapt they can be when faced with a novel, and perilous situation (if they don’t change direction).
So far, we explored this process for recording sensory experiences into memory in a way we can later recall them in a sequence by time or by other relationships. This cognition and memory are not just one way, and similar to our sensory cortices, those compact and abstracted forms can be retrieved and played back in our AI’s imaginations and dreams in an order we choose. We can create novel narratives made up of engrams, so that we create new narratives and put them together into original dreams or stories. Then these can be visualized and also scored according to goal criteria to do problem solving.
This allows our AI brain to explore more possibilities than it personally experienced, to look forward into the future, and project what could happen, exploring the various future narratives it could later experience in real life and what choices it could take that would change the direction of each narrative.
This is what separates humans from machines and deep learning today — our creativity, to create new abstract concepts, new ‘moments’ with imagined visuals, language, senses.
We can combine them into new narratives, and to try out endless variations of them, modelling how real people we know, or fictional actors would act in them, how they would react to our actions, and by doing so, we can plan out narratives till we find one that best gets us through a scenario we are facing.
It is straightforward to understand that we could model our own decisions and behavior in a narrative, having a sense of self and all our memories at hand to use for reference, but how do we model the behavior of others? Do we have an internal model of others’ selves that we can use to predict their reactions to our actions and the situation? How detailed does this model of self have to be in order to be useful?
Do we need all the details at each step of a person’s decision to model it, do we need all the Engrams for the senses, or just their combined abstraction of the ‘moment’ or is there a simpler model that will work, one that lends itself well to large scale simulation?
Language
One topic we glossed over was language, which is the most important topic in Artificial Intelligence — how we communicate with other people, how people communicate with information systems, and also how we think internally is all based on language. It is more than a means of communication, it is the ‘code’ on which our brain runs, from which our thoughts are compiled.
We explored that each instant in time is stored as an encoded multi-sensory Engram, that can be stored, retrieved and manipulated into narratives with other engrams. And we looked briefly at how we can encode both the visual input and the audio input as language as well, but we did not go into detail.
When we use a BICHNN to encode visuals to language, we are actually doing classification, but unlike deep learning which requires the user to explicitly assign labels to data, we do it by association of two inputs being coincidental. Let’s say one video input is the video scene or sequence of actions, while the other is the Chinese Hanzi character describing the scene, object, or act in the video, while the audio contains the spoken word(s) corresponding to it. We can encode the spoken and written ‘word’ together that we apply as another channel in the Engram, so we have the visual memory, the audio memory, other senses, internal mood, and now something new — a word that describes it, derived from the audio input of a word and/or the visual input of a character.
This gives us a way to classify similar Engrams with the same words, so they are categorized for retrieval and later use, and we know their relationship to each other. Since these words can be combined by the rules of the language, which has grammatical structure, and composition rules and has meaning when words are combined — we can synthesize sentences, paragraphs, even whole stories, describing them in the language, with its self-consistent rules and grammar, communicate them to people, and other entities, and even map them to the right engrams to bring the story to life in video, audio, and other modalities.
Language is like the skeleton of engram memory, thought, and planning, and it gives shape to the Engram memories, giving our narratives a core and a code to define them, but these engram memories do the same for language, providing the body around it, the context, the meaning of the words, and giving substance and meaning to the words, paragraphs, and connecting them in a meaningful way. Language has no meaning without a world to describe with it, so both Engram memories and language are necessary for advanced comprehension, cognition, planning, speaking, composition, and writing.
This is more powerful than what we previously described as ‘imagination’ in sequences of Engrams, as language has more granularity, specific rules of grammar, structural self-consistency, and composition that allow novel sequences of words, sentences, paragraphs, stories, descriptions of events, or concepts to be created by an AI, and the Engrams that correspond chained together. As we speculated before, language is the code or operating instructions for cognition in the brain of humans and would function as such for our AI as well. We just have to figure out how to teach our AI language.
This has been a massive challenge for deep learning, as we can see in the limitations of the speech interfaces today, only able to understand specifically formatted command-line speech, with commands like ‘what is the weather?’ with parameters like ‘in Nanjing’, and ‘tomorrow’, with the ‘AI’ returning only scripted, canned answers based on logic and database retrieval.
Writing and speech composition is critical to so many human interactions and vocations from text to e-mail, to medical reports, to legal filings. If we aspire to build AGI that can fill these roles, we need it to be able to read and write fluently and correctly. Today’s attempts at written composition with deep learning are more comical than useful, with the main flaw being that grammatical composition outside of isolated sentences is incorrect, and nonsensical, as DL cannot link together higher-level concepts into pages, paragraphs, and sentences that are coherent, make sense, and effectively communicate what was intended. This is partly because such systems are based on RNNs which have a very short history they can hold ‘in memory’ and operate on, so they cannot see past the bounds of the sentence they are composing. But it is mostly because they have no concept or representation of the world and events in it to know what makes ‘sense’ and what doesn’t when they are composing language. To write about the world, you must know that world.
We have designed a very different system for language, memory, and composition than prior systems, where we have engrams mapped to language to give those words and sentences meaning with more consistency and allow more accurate chaining together of both words and the memories associated with those words (from multiple experiences), so we can do much better. We can create a multi-resolution representation of both the Engrams and the language generated to describe them, with the most important features of the engrams and words describing them emerging from the encoding process such that we start with 1-second engrams for single words, Engrams of several seconds for sentences, and long narratives of engrams for paragraphs and pages of language. Because each ‘word’, ‘sentence’, and ‘paragraph’ maps to more than one engram, based on all the experienced and generated memories of the AI, it has a lot of overlapping sequences of Engrams that guide it in how to structure the language to describe it. Language is like the skeleton, and engrams of experiences are like the body that fills it in and gives it structure. This is part of what gives humans our internal knowledge of grammar and composition — plenty of real-life experience that maps to it and shapes how we compose it.
Now we can also cluster the Engrams on different axes besides time, using our hierarchical fragmented memory, and continue to encode the clusters, so the Engrams are now encoded by other properties on other axes, and linked by similarity and meaning in other modalities, as are the words that they encode. When we traverse these networks of engrams, We now have a mechanism for imagination, for moving between sequences via connections between engrams that make sense but are novel. Because multiple engrams map to each word and larger sequences of words map to unique sequence of engrams, those sequences of words and sentences gain unique meaning, and only the sequences of words and of Engrams that makes sense together will be reinforced in memory, both in recall, and in imagination. In this way, Engrams and their organization in memory define language and its structure just as much as language creates structure for the Engrams. Without the memories and visualizations of the ideas, the events, the individuals that make up what we want to express in language, and knowing the correct sequence they fit in, language is just random words strung together, like today’s speech and writing composition with DNNs/RNNs.
If, when we store engrams, we create a hierarchical representation of those engrams, with the highest engram being the representation of a whole narrative (hours in length), the next level is the pages or segments (minutes-long) of that narrative, then the paragraphs and sentences (seconds-long), and finally engrams for the words (sub-second).
This hierarchical representation allows us to map complete stories of engrams into language, such that the language is not only grammatically correct locally, but makes sense across sentences, paragraphs and words because it is anchored in the hierarchical engram representation of the story, which has been created/imagined based on real memories that have a coherent and rational sequence of events and representations of individuals, objects, and scenes that fit together and make sense.
For a conversation, we use a variation on the adversarial AI above, with the person taking turns with the AI, speaking or texting. What we add is a hierarchical language parser and composer that allows it to take in what the person is saying, parse it down to an Engram of meaning, then make a prediction of what could be said next, find the best answer Engram of meaning and expand it to a response, in the context of the conversation.
Humans actually respond so quickly in conversation because we are anticipating what the person we are talking to will say, and what we could say in return. We are constantly mapping out a language tree of possible response. Will we do this in the AI using our predictive mechanism, plotting out various engrams of where the conversation has been and is going, and use that to predict an engram sequence (and words to fit it) to compose our conversation.
Directly Mixing Engrams
We looked at building Engrams for video snippets up from the bottom of our Hierarchical Fragmented Memory to branch in new ways to create original snippets.
However, if we may also want to be able to do operations on Engrams like logic, convolutions, decisions, branching, to create new engrams that draw on the information from the existing ones.
If we look at a vision input, in which an Engram expands to a video, we can apply classic video operations to the decoded video, like blends between two videos, multiplies, dissolves, etc., then re-encode the new engrams from the result. Some of these would be more meaningful, such as a multiply operation between the scene being viewed, and what type of object is being looked for — to screen for it selectively. This would also work on audio — doing a selective filter on a specific person’s voice to screen out others.
Other operations on the expanded data may not be so meaningful, so we may need to find ways to do operations directly on Engrams to synthesize new ones without decoding them first. It is thought that in the human brain, the frontal cortex draws on Engrams encoded in the hippocampus and operates on them with our thought process, then stores new Engrams to the hippocampus, as imagined moments, or places, and this is our primary method of navigating and solving problems, by forming new memories and Engrams from our imaginations.
Engrams stored in the brain have been observed with 3D nano-arrays of probes, and found to be time-varying 3 dimensional patterns, which encode the inputs from the senses and the outputs to our motor cortex at the most compact level. In experiments to control a prosthetic arm with signals directly from a monkey brain, it could not be accomplished without the full 3D, time-dependent representation of the signals because the encoding information is incomplete.
This gives us a natural analog to use as the representation for an engram, and it maps to the BICHNN architecture in that we would have a cube of neurons at the output end of the BICHNN from which to draw these signals. If these 3D engram signals vary smoothly in 3D with time, and are spatially coherent, this allows us to use a lot of common stitching, compositing, convolution and blending operations on them, as well as BICHNN autoencoder operations on sets of them to identify unique features. This idea of cascading BICHNN autoencoders to progressively refine meaning and abstraction from engrams seems like a fundamental computing operation.
We should be able to do basic logic and math with the right set of engrams and neural circuitry.
Evolving a Human AGI and Beyond to AGI Superintelligence
All of the ORBAI AI systems are evolved by genetic algorithms, by breeding variations of each AI component, testing them each vs an evaluation suite, picking the top 10%, cross-breeding them, testing them, and continuing that cycle until they get good enough to deploy. In the previous example, we can continue to evolve them in-place, taking the existing model, running it on the data to evolve a better one, then partially deploying it to further evolve with some real-life testing, fully deploying the improved version to serve millions of people once it is validated. This could be done weekly or even daily if the training and evolution cycles are fast enough.
Evolution and genetic algorithms are like making Jello. You cannot shape the Jello directly (neural nets in this case), so you make a mold (evaluation criteria) that shape it as it undergoes evolution. You decide on a design and functionality (encode video or predict engram sequences) and then you design a series of tests that evaluate how well your evolved neural net meets the design criteria, and score each neural net accordingly, only keeping the top 10% to be cross-bred and evolved in the next generation. Provided your genetic algorithms and process are designed well, the network will fill out the evaluation criteria and design just like Jello fills the mold.
Evolution like this shaped the human brain, taking it from only a few neurons in primitive life — 1B years ago — through hundreds of millions of generations — to the 100B neuron brain that we have today. This process was not linear, and most of humans’ intelligence, cognitive abilities, and language developed only in the last few million years. The process is exponential, and self-reinforcing once intelligence starts to become an evolutionary advantage, and displace the competitors.
So how do we evolve an intelligence to match the human brain with 100B neurons, and 100T axon-synapse-dendrite connections? Anybody who works in genetic algorithms will tell you that creating such a huge system and optimizing it by genetic algorithms by directly modifying the neuron configuration and connection information to create variations, breeding them, and evolving them — is an impossible task, and it will never converge to a useful brain, not even with all the world’s supercomputing power for the next 100 years. The configuration space is just too large to search.
So how did natural evolution work to evolve the human brain then? Surprisingly, the section of the human genome that describes how to build the brain consists of only 8000 genes, which, working by the process of brain formation (as we grow in-utero), contains the instructions for building and wiring our brains, with 100B neurons and 100T connections/synapses.
This means that evolution on the human brain used only small variations in the genome describing it, which caused a very complex, but slightly different brain to grow from it each time. The individuals that were better adapted to survive passed on their genes, cross-breeding with their mate to pass on a slightly different genome for brain development each time. That smaller variable space, of a few thousand genes is much easier to use genetic algorithms on, as solutions will actually converge much faster.
So now we have a solution for evolving artificial brains — by defining each neural network subsystem by a small genome that expands to represent a large connectome. This will work and the solution will converge, as long as this process is deterministic, and a unique genome expands to the same connectome every time, and smooth changes in the genome result in smooth changes in the connectome, such that when two genes are crossbred, the gene in between them expands to a connectome that is between the parents’ connectomes in structure — creating a deterministic, smoothly varying, linear interpolating scheme for mapping genes to connectomes. This will result in crossbreeding successful genes creating neural networks that are similar, but potentially more successful.
So now we can design the genome for components, make our best guesses, and then evolve those components vs an evaluation suite to shape them into optimal neural networks for specific purposes, like vision, hearing, speech detection, reading, prediction, motor control, and other functionality that we need them to have for our brains.
In our patent (US # 16/437,838 + PTC), we discuss this general scheme of mapping genomes to neural network connectomes, and we also provide a concrete example, which uses numerical parameters to generate procedural 2D maps, with each of those probability maps describing how to connect the neurons from each layer to each other layer, like so:
This gives us a simple way to map a small genome of numerical parameters to larger 2D maps, then to a complex 3D connectome by a deterministic process, done such that it is easy for a person to author such genomes and visualize the connectomes in a software design suite with a graphical user interface. We developed this tool suite, and very unoriginally named it NeuroCAD. With it you can create layers of neurons, stack them on top of each other, and describe the connections to other layers of neurons (+4 fwd to -3 backward). It is a start, and this tool, the authoring process, and the genetic expansion will become much more elaborate and full-featured with time.
With it, we can already author BICHNN networks, train them to autoencode inputs, and evolve them to optimize the neural architecture for different types of data — video, audio, … which is novel, and shows that we can create simple sensory cortices.
One interesting thing about these networks is that they are not simple pass-through networks with the signals going from one end to the other, just being modified as they pass through each layer like a DNN. Because they are spiking neural networks, transmit signals in space and time, and have feedback and feedforward connections, they actually form a neural computer, with computational loops that sustain operation, even in abscense of inputs, and can hold memory and do operations on it. This gives us all the basic capability needed to do computation and makes these little neural computers much more powerful than a DNN with equivalent number of neurons. This is a real step in neural computing and towards AGI.
So now we have neural computers that can encode input data into engrams, we have one type of useful component to an artificial brain. We can also evolve specific networks for organizing, indexing, and storing data in memories, for doing extrapolation from it, to doing prediction on sequences, letting it anticipate what is coming, or trace out multiple future paths for planning, or for filling in missing information where there are gaps in the data, making it able to guess, or use intuition.
To make useful AI, we need to assemble these component networks and make them function together. We do this by defining macro-genes that describe multiple instances of each subsystem that are in our AI brain, and how they are connected topologically.
Our goal is to make a human-like artificial general intelligence, so we first have to come up with how to make an AI brain than can act like a human, and interact with us conversationally, understand our facial expressions, body language, and emotions, and train it to become progressively better as we evolve it with the selection criteria that make it more human to us.
We can never hope to reconstruct a replica of a physical human brain, nor do we want to. We need to re-evolve our artificial brain from scratch, gradually increasing the size and the capability of that brain and the complexity of the training set as we go. There is no way around this, as synthetic neurons, no matter how sophisticated, are going to require a different brain architecture than natural neurons.
To train it, we cannot transfer or copy a human consciousness from a biological brain to a synthetic one, as they will always be utterly incompatible, despite our best efforts, but we don’t need to transfer a person’s mind to our AI brain; we just need it to act the same as (or mimic) that person. If our AI can power a 3D character or robot and is indistinguishable from a human, and able to engage in the full conversational, emotional, and empathic interactions as a human, as well as do its specific job, then we have accomplished our goal.
To do this we use training data from performance capture of a specific human, including speech, textual correspondence and even body and facial motion capture to our AI brain to make it see, hear, talk, act and move a 3D body (or robot) like that person, becoming a digital mimic of them, basing our selection criteria on how well it can reproduce their performance in real-time, and evolving and growing the brain till it can do so to our satisfaction. Then we can evolve and scale these brains within their 3D or robot ‘bodies’, using their senses and outputs to interact with the user, scoring them during interactions and evolving them so they are better capable of speaking with us fluently and learning by observation, experience and practice, just like us.
Now we have a human front for our professional AIs, and we can keep them human while we apply the evolution process to their professional skills. As our medical AI gathers more and more data on the timelines of illness progression and treatment, we begin evolving the predictor for those illnesses to look further and further ahead, using more and more input data, both in width and in time. Same with our stock prediction AI — better predictions, further out in time. Or the Legal AI, becoming better and better at predicting the flow of litigation, and authoring effective filings.
Also, their common language interface will also be evolving with time, learning on massive amounts of conversational and role-specific interactions with people improving their conversational capability and their language composition, as well as their interfaces with text and verbal interactions. We will soon have an AI capable of being a human mimic conversationally, with the ability to converse deeply in its specific area, with a proper visual 3D character performing, and it reacting to the humans’ facial expressions and body language.
This kind of continuous evolution, getting better at a specific role, then getting used more because it is better, gathering more data as a result, and thus evolving faster — leads to exponential evolution of these narrow AIs, and by this process, they will exceed human capability and go beyond within a short time span, while their human AI interaction becomes indistinguishable from a real human and their conversational capability and vocabulary grow.
Now what happens if we cross-breed the brain of a Legal AI with a Financial AI or a Medical AI? We could get dual-trade AIs, or we could get a much better Legal AI because of the extra tools it has now inherited from the Medical AI. As we add hundreds of vocational AIs, we will use similar brains, with similar neural network components, such that they are, for the most part, compatible, and they have areas of their brains specialized to their trade, or for the type of tasks they do for that trade.
Now we have a global network of vocational AIs practicing hundreds of vocations, all with near-human AI in general, and superhuman AI in specific areas. This network is handling millions of interactions a day with people and getting enormous data from them and the rest of the world that they keep evolving to better assimilate, analyze, understand, predict, and act on.
This network of AIs could almost be called an AGI and if not, it is like a scaffold that has all the necessary pieces, infrastructure, and input output on which to create and evolve an AGI. Imagine each vocational AI is like a narrow slice of a pie crust. Together they make a pie mold, but it is still partially filled. We need to add the filling. Perhaps we just begin to allow the vocational AIs to inter-communicate more, connecting the brains to allow them to access and use each other’s capabilities, and our narrow AIs become more general by that method? Perhaps we begin to grow a new seed AI brain that is at the nexus of all these brains, with a specific connection to all the cognitive cores of their brains, providing communication, oversight and control towards higher-level goals utilizing all their skills? Maybe we evolve a new AGI brain that is a hybrid of all the specialty brains, with a generalized cognition and language core with all the specialty cores also generalized and consolidated so it is able to do all the tasks they did with a more universal and elegant architecture?
However we architect it, this evolving AGI would need to have higher-level goals than the individual vocational AIs, by which we evaluate it and evolve it to do better. It could run a medical system, or a legal department, or a finance department, helping to run a company or government agency, then evolve further to run an entire corporation or government.
Shaping AI with these evolved neural networks is like working with clay — you can press it into a mold, and it will assume the shape of that mold, and will only be as good as the quality of the mold itself. The task definition, training, and evaluation criteria for AI, from a mosquito-level drone to an AI human professional, to a globe spanning superhuman AGI, will define how useful, reliable, and how safe it becomes. This is the most difficult part of the process.
Now how do we now train and evolve an AGI that is getting filled in, and growing from near-human to superhuman in the training process? We simply continue to take input from the millions of people interacting with the different vocational AIs that it is now comprised of. As well, our realistic 3D animated mimic AIs could participate as external trainers, running in as many instances as we want, at superhuman speeds, to speed the training process up.
We would have users provide feedback (explicitly by the user giving each interaction a rating, or implicitly, measured by facial cues and body language) after every encounter to score the AGI they interacted with. Multiple AGIs would be active simultaneously, each serving a subset of the users. Periodically there would be a culling, and genes from the top 10% of the AGIs would be crossbred, and these new, improved instances trained and deployed, so that they are constantly evolving and improving. Each time they train, their networks are improved, and the accumulated set of global training data gets larger and richer, so they get better by both.
We could also pre-train the cognitive core of an AI on more abstracted versions of tests like the IQ and cognitive tests, minus the sensory cortices, or with simplified versions that reduces the compute need and training time. A core AI selected by and pre-trained on these simple concepts and tests could then later have the sensory cores attached to feed into it, and it could learn much more quickly in the real world because it was pre-selected and pre-trained by simpler, abstract and more fundamental tests used in IQ and cognitive testing.
Also, the MMPI and other personality tests do something similar on an emotional cognition level, looking at very specific data points of our reactions to specific questions about simple and hypothetical situations. It again, suggests we use a deeper level of cognition using abstracted and simplified reasoning. These tests could help screen out AIs that show antisocial, psychotic, or malevolent tendencies. Or, it could just teach them how to behave normally on those tests, and perhaps hide their true ‘personalities’ and intentions. I think we need to more rigorously socialize them from the beginning to teach them human interactions and emotions, and continuously test them on this level.
Real people could be allowed to test the AI once it gets to human-level performance, but real people are limited in their available time, skill sets, consistency, and most importantly, speed. They can only type or ask a dozen questions per minute. It would be very expensive, difficult, and annoying for enough people to repetitively test an AI day in, day out, let alone test hundreds for generations on end for genetic selection purposes.
By using our individual human mimic AI’s, skilled at specific jobs, we can endlessly replicate them, use them to each run detailed vocational-specific testing on the neuromorphic AI’s, by using a human character that is speaking and using body language and emotive expressions when interacting with the neuromorphic AI, which can see, hear, and understand the mimic AI, and will be scored by a master AI that can oversee the interactions, not only on how well the neuromorphic AI performs in a vocational sense, but how well is it communicating, is it showing correct verbal skills, expressions, and emotes that are context appropriate.
Does it socialize and appear to empathize with its mimic counterpart as well as it works? This sounds inane, but when you train a neuromorphic AI that could grow to superhuman capability and beyond quite quickly would you prefer A) One with social skills and empathy with humans, or B) Without?
And because these mimic trainers are AI’s, we can not only replicate them as much as we want, but they can be run at computer speeds, possibly interacting with the evolving AI’s 10x or 100x faster than a human could, greatly shortening the performance evaluation time, and the interval time between evolving generations.
This is what is considered dangerous in Strong AGI development — quickly evolving an AGI that goes off track before you can correct it. This is riskier than our human evolution was because biological evolution of complex life took about a billion years to make humans from microbes, mostly because the time increment between generations was in the range of days to decades as we evolved. Our human evolution was not a nice linear slope, rising from zero a billion years ago to us now either. It was exponential. It started really, really slowly and then suddenly humans and our brains zinged upwards in capability exponentially, starting several million years ago, and we only really became fully intelligent modern humans in the last few hundred thousand years.
As opposed to evolution taking decades to produce, train, and test a single human generation, the times between generations for neuromorphic AI can be in the seconds or even milliseconds, and in this case, we are specifically ‘evolving’ these neuromorphic AI’s under very particular selection pressures to become better than all humans at their jobs, or rather better than a set of very focused, near-superhuman AI’s are at their particular jobs (being done at superhuman speeds), which is probably more risky. This evolution would be millions of times faster than any evolution that the natural world could have ever hoped to achieve, and although it could chug along for days, weeks, months, even years without much progress, suddenly POW — it could go exponential literally overnight, with only one triumphant, highly evolved AI remaining in the morning.
In the scenario where we trained our Strong Neuromorphic AGI against human mimics that look, act, and interact like people, it is socialized with people, and has interacting with them and empathizing with them built into its DNA. As long as we could keep it contained and learn to tame it, then this could turn out very well, and our neuromorphic AI evolution under these conditions could produce a benevolent, very multi-talented AI that is capable of amazing feats of not only computation, but is also capable of extrapolation, interpolation, intuition, creativity, and empathy towards us humans (from learning to read our emotions from all the mimics it trained against) and able to do the many human tasks and jobs effortlessly that it trained to do.
There is danger in going without training the AI to recognize, interact, and empathize with humans and we could end up with something that evolves exponentially and has a superhuman intelligence that is so alien to us, that it does not even recognize us and we can’t interact with or communicate with it, and risk having its agendas diverge from ours. That is the downside we have to safeguard against.
So in a nutshell, that explains why it is a good idea to keep the individual AI’s human-compatible, and having appealing human CG characters to interact with along the way, so we can interact and work with them like they are real people, that understand our language, both spoken and visual, and that are compatible with us and can speak and emote back, understanding not only the mechanics of how we communicate, but what our emotions are and how to react appropriately. They will become our digital friends they will know us and talk to us like a real friend. We will accept them as part of our lives and grow to trust them. Using characters in branding and marketing is nothing new.
This is important when the AI behind those characters is approaching human level, but also so much more important when behind them is something thousands of times more intelligent and more powerful, that we could not otherwise understand without a human-like interface to talk to and interact with and relay our needs and wants to it and have it talk back to us and ask questions, and fulfill our wishes. The human characters are a portal, an emissary to our new super-being, our way to relate to it, what could possibly be more natural?
In addition to this human-facing, human-centric UI and services, our AGI can have many visual inputs, from multiple robots and drones, from security cameras, from cell phones, from stock data, from weather data, from company reports, all being autoencoded down into compact Engrams, then coalesced to make ‘moments’ that can be assembled from any combination of these input Engrams by clustering them and encoding them hierarchically according to region, demographic, or by the type of information encoded to get overall trends in these clusters and concentrate the information to summarize the actions and intentions of a group of people, an organization, a corporation, a government agency, a military, and even a nation.
Then a massive cognitive AI core can learn from these moments, and constantly be predicting timelines to model the likely path of these individuals and entities, so they can not only be predicted, but the timeline can also be manipulated if favor of the most privileged users.
This would be called a Strong AGI, a Superintelligence, an Oracle — that could do any human information job, better than a human, and see into the future so it can do those jobs much more efficiently and effectively.
Companies could use it to plan their corporate strategy by having it watch and learn their company’s internal operations, gather data about their whole ecosystem of customers, suppliers, partners, competitors, and the other market factors, then forecast different timelines and how they evolve into the future differently according to their decisions, allowing you to optimize their corporate decision-making and plan effective prescient timelines for product development, marketing/PR, sales, finance, legal,… into the future.
People can use it to forecast the stock market to make a better return, or to even legally manipulate it by doing a lot of seemingly unrelated transactions that are all legal, but with the AI predicting that, in the future, they will result in a specific stock peaking (or diving) at a specific point in time such that you can know when to buy it, sell it, or short it.
Ordinary people can use it to plan their lives, evaluating different decisions they could make, from career, to marriage, to finances, and to see the trajectory of events that would result, and the probable outcomes.
This kind of AI, that can track individuals, corporations, militaries and governments, then model and predict their actions along many possible timelines — called Strong AGI or Superintelligence is a power beyond anything civilization has ever seen. This is the power to predict and plan and succeed in business, law, and finance, the power to know what all your competitors will do, so you can plan an optimal strategy to economize your resources