Dinosaurs & AI

Could AI Help Make Jurassic Park Happen?

Robotic Troodon at the MIT Museum, photo by Barnas Monteith. Troodon formosus is one of the species found at the “Egg Mountain” fossil locality in Montana (photo by author)

As an elementary school student and middle schooler, my number one goal in life was to become a paleontologist. In the summer following 8th grade, a brand new program at the Boston Museum of Science was advertising the opportunity to spend a summer with world famous paleontologist Jack Horner. My mind nearly exploded when I learned about this once in a lifetime chance to be part of paleontological history. Years of going to museums with awesome paleontology exhibits around the country just didn’t compare to meeting and getting to dig for dinosaurs alongside Jack Horner.

At the time, Jack was an advisor to Michael Crichton / Steven Spielberg’s “Jurassic Park.” He was also the real-life inspiration for the lead character in the book, Dr. Alan Grant. Jack had also just come off a book tour after winning his MacArthur Genius award, for discovering North America’s first dinosaur eggs and determining that dinosaurs did in fact raise their young like most birds do today.

Jack had always been known as a technological innovator. And he still is today. He was among the first paleontologists to regularly use CAT scan technology to analyze delicate and hard-to-extract fossils. He had also worked with a team of engineers trying to figure out if they could use ground penetrating radar to detect fossils under the ground, without the need to dig them out. He was also rather unafraid of breaking the norms. When it was important to take microscopic thin sections of bones, he wasn’t afraid to take specimens right out of the museum and cut them apart with a rock saw (for histology / fossil bone cell studies). Nor was he afraid of using jackhammers to remove the “overburden matrix” on top of an otherwise delicate expedition site, to make his expeditions more expeditious, at the risk of breaking occasional valuable fossils.

Author, as a very young student, using a jackhammer at “Egg Mountain” in Montana

“Jurassic Park” was essentially required reading on our first expedition. Since you’re reading this blog, you know the plot. An island full of dinosaurs were created in the modern world — genetically re-engineered from ancient DNA pieced together from the blood of fossilized mosquitoes, trapped and perfectly sealed in amber. In theory, this sounds like a reasonable idea. Mosquitoes, which indeed were around during the time of dinosaurs, surely drank the blood of the occasional velociraptor or T-rex. And, they probably were regularly trapped in amber. If you could simply extract and isolate the dinosaur DNA trapped inside the mosquitoes (even if some of it had been broken down or mutated) you could theoretically piece together all the fragments into a long strand, arrange them roughly into chromosomes, insert them into a modern bird egg, and potentially hatch a little dinosaur. Well, that’s the theory, anyway, based on science fiction conceived in the late 1980’s.

When I first met Jack, it was on a dusty road near Choteau, Montana. Jack was standing over the snapping head of a dead, bleeding rattlesnake, holding a shovel. It was like a scene right out of a movie. The rest of the snake’s body was writhing in the dirt as we on the expedition team (mostly suburban civilians like me) stared in horror. Jack’s colleague later collected the rattle, as a keepsake. Apparently, for folks raised on Montana ranches, killing rattlesnakes was normal, everyday business. It may seem like a terrible thing, but this was necessary to ensure that your team wasn’t in any danger of being bitten (since hospitals were many dozens of miles away, if not more, on slow, meandering dirt roads).

Author, Barnas, with Jack Horner, in the 1990’s

After minimal training on what to look for, our job was to search for new fossil sites. We followed Jack over miles of the badlands of central Montana throughout the day and exhausted, we were assigned teepees to sleep in that night. But before going to sleep, we were all invited to have a wrap up discussion, while sitting on logs arranged in a circle around a giant campfire. This was a common daily practice at Camp Makela. These were the rare times we would get to have quality dialogue with Jack, who rarely stopped moving throughout the daytime. I recall my first real talk with Jack, as he sat next to a cooler, drinking his favorite “Rainier Beer” . He recounted his verbal battles with Bob Bakker, a rival paleontologist who had publicly opposed Jack’s position that T-rex was actually more of a scavenger, and less of an active predator. While there’s plenty of evidence both ways, I’m not sure the answer to this question is still quite resolved yet, but it’s obvious the movie industry clearly doesn’t care much. As the night wound down, the conversation headed to the possibilities of the future of paleontology and technology.

I recall asking Jack if he thought it possible to use genetic engineering to resurrect dinosaurs. Even though this was around 1990 (when the Human Genome project that would take a bit over twelve years to complete was just starting) he told me that as a child, he had always wanted a pet dinosaur. And with this new field of genetics research, that might one day be possible…

The Problem With Hydrogen Bonds

The big problem with trying to create a real Jurassic Park all boils down to something quite simple: hydrogen bonds.

An example of a hydrogen bond, weakly holding two pyridone molecules together. The same weak type of bond holds DNA strands.

Hydrogen bonds are among the weakest types of chemical bonds. They’re extremely convenient if you’ve got to make about 50–70 billion new cells per day (which an average human does) because you can unzip, separate, duplicate, and re-twist new DNA strands together very rapidly. But if you want a more permanent chemical bond, you need something stronger than hydrogen bonds. This is the main reason why DNA in fossils starts to break apart after a few tens of thousands of years. However, on the other hand, proteins are built with much stronger bonds in general and can stay well-bonded for millions, even hundreds of millions of years.

SEM micrograph of an eggshell containing membrane fibers composed of glycoproteins that are capable of surviving hundreds of millions of years largely intact. (Copyright 1996-, All Rights Reserved, Barnas Monteith)

In fact, this is why collagen and other fibrous glyoproteins (proteins with sugars attached to them) can be found in dinosaur fossils in the Triassic (over 200 mya), while residual DNA really can’t be found that far back in time. (in fact, this was the maintopic of my science project and additional work I did at Harvard MCZ for many years on fossil eggshells, which is written up in my book: Dinosaur Eggs & Blue Ribbons.

These two DNA strands are held together in the middle by hydrogen bonds.

Pleistocene Park

A real Jurassic Park of sorts is apparently not such a terribly far-fetched idea, as there are solid, well-funded plans to at least bring back the Woolly Mammoths. Dr. George Church, a geneticist at Harvard University, has been able to revive dozens of genes in extinct species. It has been shown that it is indeed plausible to swap out and “wake up” extinct genes and to insert them into healthy extant (living) elephant eggs, using CRISPR technology.

The plan, supported by the Russian government, is to take DNA sequences from frozen/fossilized Mammoths from ~40,000 years ago and to use them as a template to bring the Mammoths back to life. Then, to release them into a park in Siberia, which will be called “Pleistocene Park.” Is it a good idea? Probably not, but the fact that it can be done and is being done is actually quite cool. It takes us one step closer to the possibility of a real Jurassic Park. And we all know how that turned out.

How about AI? What does AI have to do with paleontology today?

Well, let’s first take a look at where AI is today, in the realm of paleontology. AI has been used indirectly in many ways to enhance the image processing and 3D tools used by paleontologists. But there have been a couple of exemplary areas where AI has been very clearly and directly beneficial to paleontology/paleobiochemistry:

Rapid bulk microfossil identification. Having been a very young researcher at Harvard’s MCZ, one of my first jobs when I first started, working under Farrish Jenkins (the guy who found famed missing link Tiktaalik along with Neil Shubin) was to count microfossils. I can tell you right now, this sort of task almost makes you not want to be a paleontologist. Literally going through hundreds or thousands of fish scales and other tiny bones (usually from Greenland or Arizona) to find whatever there is to find. However, it is a very important task. When trying to ascertain knowledge about a particular prehistoric ecology, it is critical to go through every microfossil in the region, especially when it represents an aquatic environment, and full of life. It helps to build up information about how the creatures lived, and perhaps some clues as to where more fossils might be found, through simple correlations. As of 2019, it has been demonstrated that it is indeed possible to use convolutional neural networks with a bit of pre-processing, to rapidly take on the automated task of visually identifying and counting microfossils in core samples. This saves researchers a lot of time, but it’s a few decades too late for me…
Fossil-bearing site identification using satellite imagery. Several years ago it was found that the resolution of Google Earth and related public satellite imagery sites is high enough to be able to use satellite image of fossil-bearing localities to train a CNN to identify additional likely sites. Researchers have used known regions of fossiliferous areas, to be able to target new potential locations, thereby reducing the effort in conducting surveying/prospecting work in large field studies. (and I am working now with one of my own science fair students to enhance this concept — I promise to put up some cool videos about this very soon)

Sample satellite image, showing a section of ~100ft of a fossil-bearing region in Arizona; by using the patterns of the deposited layers with a CNN, it is possible to predict high likelihood areas to find additional fossils. This shortens the investigation time for paleontologists.

There is talk (and a few early papers) of the use of machine learning in phylogeny (tracing the branches of evolution in a large number of different species) and there are of course various ways that AI programs benefit the image processing software that paleontologists use, among other things, but this is pretty much where things stand today, for traditional paleontology. The future of paleo+AI however, I believe is a far different story…

Where AI stands today

While AI has made amazing breakthroughs in recent years — and the rate of discovery and invention in AI seems to be increasing over time — much of that is in the space of “narrow AI.” Meaning that the amazing AI we see examples of all around us are only able to be excellent within a very narrow field — like Alexa and Siri’s ability to recognize our voice, and autopilot military drones in the R2D2/Artuµ project, and cancer diagnostic CNN’s that are better than doctors, and AlphaGo’s ability to beat top Go players (and play PacMan and other video games better than top players, and MIT’s AI that discovered a brand new super-drug that is better than the world’s top antibiotics — and more! Yet, with all these superlative examples of modern AI, a more generalized form of AI that can cross over between different AI systems has not been conceived yet.

Imagenet — which is the name of a generic image-based AI model (today with over 14 million images) and also the name of a worldwide AI benchmarking competition — is a key indicator of AI’s general progress in recent years. As of 2015, the competition demonstrated that an AI model was able to beat the ability of humans to recognize particular objects (on average). This was a major threshold, and since then, most image recognition / object detection software has been able to achieve high 90’s in terms of % accuracy — much faster than better than people can do these same tasks, thanks to several years of increased research on deep learning. Since that time, the number of “narrow” fields using deep learning has increased, and the breakthroughs have been coming faster and faster.

The latest rendition of the Imagenet competitions is now hosted on Kaggle, and according to the leaderboard, it would seem that some competitors have achieve 100%, or just near it, in terms of accuracy within the last year (although I do have some doubts about the way this competition was held, it is quite remarkable that so many contestants were able to achieve such high accuracies).

Since 2012, improvements in image classification have been due to AI/deep learning

Since Alan Turing’s concept of a Turing Test (originally called the “imitation game” like the 2014 film) in 1950, it has generally been thought that the holy grail of AI would be NLP, or Natural Language Processing. This is the idea that if you can speak to a computer and you can’t tell that it’s a computer (as opposed to a real human) it has then been said to have “passed the Turing test.” So far, nobody has done that, or even come very close. But, that day is approaching fast. And, in fact, there are some comparable AI concepts that are coming pretty close (which I discuss in another blog and in my book, Get Ready for AI). Especially in the field of biotechnology.

One such amazing innovation in AI occurred quite recently, and in fact, was just announced a few months ago. Alphafold. From the same group that developed AlphaGo (Google’s DeepMind), is a new neural network based solution to one of the biggest problems in biology — and that is protein folding. This is considered a 50 year old Grand Challenge in the field of biochemistry, and as of last year, is able to achieve accuracies over 92%, which is better than or comparable to actual (slow and expensive) lab experiments involving x-ray crystallography.

Alphafold 2 Block Design (from Wikipedia / Deep Mind) showing its unique architecture capable of deciphering the 3D structure of proteins given only a set of amino acid sequences

With Alphafold, there is now a way to predict the tertiary (3D) structure of proteins based merely on a known sequence of amino acids. This will allow new medicines to be created, and, will allow for the replication of ancient proteins, based on known fossil DNA, homologous genes, and/or homologous extant proteins. That may sound a bit complicated, but it’s quite exciting. There are now all sorts of ways that you can reverse engineer old proteins (for instance, to resurrect dinosaurs) — and create new ones, perhaps to cure major medical problems. All thanks to advances in AI.

Check out my video about Alphafold here:

Quick history about the study of genomes

The Human Genome Project (which started in 1990 and ended in 2003, for a pricetag of ~$2.7B) ended up determining that there are about 3 billion base pairs, with about 20–25,000 genes embedded within human DNA. In humans, it is thought that up to 10 proteins may be encoded by a single gene, and ~100,000 to 400,000 proteins in total can be found within the human body.

Over time, the process of sequencing genes has become faster and faster, thanks in large part to Next Generation Sequencing (NGS) in recent years. And, in fact, some NGS systems do use AI to increase the speed of mapping and matching overlapping DNA fragments together into longer sequences, to create master maps of genomes more quickly than ever before. What took the Human Genome Project 12–13 year to complete, can now be done in the basement of the Harvard-MIT Broad Institute in a matter of minutes.

English: Sample sequence showing how a sequence assembler would take fragments and match by overlaps. Image also shows the potential problem of repeats in the sequence.
Date 6 May 2014, 21:34:58, Source Luongdl — Wikimedia Commons Creative Commons License

About the time that the Human Genome Project finished up in 2003, a big project began to sequence the chicken genome (for the much cheaper price tag of about $13m). It was found that the chicken genome is comprised of about 20,000–23,000 genes, represented within around ~1 billion base pairs.

That makes humans and chickens somewhat similar as far as the number of genes, but the total number of base pairs indicates that there is a lot more complexity to humans. Therefore, dinosaurs, the ancestors of birds, must be closer to reptiles, and thus, have a far simpler genome, right? Well, according to recent research, a reptile known as the tuatara actually has a genome that is far bigger than that of humans (with the tuatara’s close to 5Gbp). But how can that be? A lizard with a bigger genome than a human?

Well, DNA is not actually all quite useful it turns out. Some of your DNA is comprised of exons (parts that code for proteins), introns (DNA in between exons that doesn’t seem to do anything)and other DNA that is in essence “junk” or “dark matter” or a broken, unknown “fossil” of sorts, left over from your ancestors, that you no longer use.

It turns out that tuataras have 64% repetitive DNA, so while much of it is in fact used, it’s sometimes copied multiple times within the genome, so it’s not that the tuatara is necessarily complex, it’s just got some redundancy built into its genes.

In humans, for instance, only 1.1% of the genome is exons (which make useful proteins), 24% introns and 75% intergenic DNA. So, in reality, quite a lot of your DNA represents inclusions of DNA that have been building up throughout the evolution of life on Earth. That means that a large amount of your DNA might contain clues as to your evolutionary past.

Chickenosaurus: One future potential for paleontology + AI

If you do a search for Jack Horner’s current work and publications, you’ll find that he was indeed very serious about wanting a pet dinosaur, and truly believed in the plausibility of Jurassic Park. That’s because much of what he talks about these days in Ted talks, books, etc. is the “chickenosaurus.” It’s the idea that dinosaurs can truly be resurrected by simply turning certain genes on and off, in the chicken genome (and maybe splicing in a few new ones).

The idea is based on the theory that chickens, which are the only other amniote aside from humans whose genome has been thoroughly studied, are now quite thoroughly accepted as the relatively direct descendants of certain types of dinosaurs. So, in theory, their DNA should contain the remnants/signatures of dinosaur DNA.

A video I made for a middle school workshop about how dinosaurs are related to birds.

With enough evidence of the functional expression of certain genes in chickens and reptiles (a task that is well suited to AI models that are tuned to sequential data like recurrent neural networks), it is possible to determine analogous genes between birds and reptiles. This information can then be used to find the ideal points to insert key traits like tails, teeth and scales, and to turn off genes that controls the development of wings and beaks.

In the past this task was complicated by the idea that genes themselves are not the only things that control phenotype (what you see as a result of gene expression) in animals. There are loads of helper proteins that play a role in signaling the expression of other genes, during development at key stages in an animal’s life. However, what once seemed like a mathematically impossible task is becoming more realistic to solve as time goes on. It is possible that the use of large datasets of the genes of dinosaur’s modern day relatives can provide clues as to the likely makeup of the genes in the past, using the powerful pattern recognition capabilities of AI. And, now it is not only possible to reverse engineer proteins based on fragments that are found within some exceptionally well-preserved fossils, but it is also plausible in theory to reverse engineer the sequences of complex protein structures based solely on their shapes (or how they “fit” into other proteins). A key tool that was previously not available to us — thanks entirely to AI.

Some progress has already been made with the Chickenosaurus, without significant use of AI technology. For instance, in 2006, a researcher in Wisconsin was able to make a chicken embryo with alligator-like teeth. And, in 2011, a scientist was able to create a snout in a chicken, with the use of protein beads inserted into the embryo. It is already known that chicken embryos grow tails, and that this gene turns off at some point in the development process. But, in theory it can be turned off as well. It won’t be long now…

Image of chick embryo with a tail. From: Rashid, Dana & Chapman, Susan & Larsson, Hans & Organ, Chris & Bebin, Annegaelle & Merzdorf, Christa & Bradley, Roger & Horner, John. (2014). From dinosaurs to birds: A tail of evolution. EvoDevo. 5. 1–20. 10.1186/2041–9139–5–25. Creative Commons 4.0 License.

While I’m not sure that the best use of Alphafold and other forms of biotech AI in the world is to make a Chickenosaurus, or to use these tools as a roundabout way of better understanding paleontology, it certainly is fun to imagine that in the near future, Jurassic Park (as well as cures to some of the world’s worst medical problems) may just be possible after all!

A slightly scary rendition of the Chickenosaurus, by the author.

For those of you interested in science fair project ideas, AI+paleontology is a great way to go! I won 13 1st Place awards for a highly cross disciplinary project relating to the field of dinosaur eggs, biochemistry and data science as a young student, including multiple International first places. My work lately involves reaching out to encourage more students to engage in STEM and join science fairs. Feel free to reach me at www.tumblehomebooks.org. Read my other medium blogs about science fairs and AI:

Footer