Over the weeks, I gradually assembled some crude computational machinery to sift through the giant DNA database. I pulled together published metagenomes (random mass-produced sequences from the environment) from seawater, sediment and the few subseafloor crust samples that were available, and focused on the misfit sequences that hadn’t been matched to anything known. I searched for sequences that appeared multiple times in different abyssal zone samples and eliminated the ones that were also found in a control database from terrestrial environments. These searches produced long lists of sequences that were worth a second look. Unfortunately, this literally meant me looking at sequences and trying to manually figure out what was going on. I could eliminate some of these sequences in more focused searches by matching them to known species that had already been identified in that sample. If this were a typical metagenomic project, I would have been very happy about finding these connections, as they chip away at the metagenomic dark matter. But this was just sorting the unknown into safe, comforting known packages – it was not uncovering anything truly new. I also used the SETI approach to search for messages in DNA that stood out from normal genetic code based on their information/entropy content. I ended up with a bunch of candidate sequences, though I had no guess as to their function or meaning. So far, I had not discovered the digits of pi or the prime numbers encoded in any obvious way.
At the same time, I also tried looking for signs of suspicious DNA splicing, based on the hypothesis that this creature was an expert in biotechnology. This turned out to be difficult because hopping between genomes is the most natural thing in the world for DNA. I found all sorts of “genomic islands” of mobile DNA, but most of this was pretty much the usual transposons and proviruses you find all over the genetic landscape. However, one of my searches pulled up a chunk of DNA that looked like a combination of wildly different species: a protein from an animal flanked by clear bacterial sequences. I started to get excited – even if this had nothing to do with what I was really looking for, gene transfers between animals and bacteria are rare and interesting. I ran a test to see if this was a so-called chimera that had arisen from a sequencing error, where two separate pieces of DNA mistakenly get copied into the same sequencing reaction. Crap. The search found part of this mixed sequence as an independent, normal-looking sequence in the same metagenome. This meant the odds were strongly in favor of the mixed sequence being a chimera. Oh well, a routine disappointment. I continued to toil away at this analysis, tuning the parameters a little to reduce these kinds of false positives. Not long after, the search flagged another sequence like this. I glanced at it and again, my heart sank a little: it was the same one I had found before. But wait, this was in a completely different sample. In fact, this one hadn’t even been collected by the same research team. The odds of generating the same chimera in two independent samples is pretty low. I started looking at this in more depth. It was impossible to interpret this sequence as DNA, but when I translated it to amino acids an amazing pattern emerged. Part of the amino acid sequence had a good match to a proteolipid protein (PLP) from a shrimp species, and this region was flanked by sequences that looked like bacterial lipoproteins. I followed some links to find out more about PLP. Also called myelin PLP, the dominant protein that insulates axons in the central nervous system, greatly speeding up transmission of electrical signals among neurons [12]. A brain protein in a bacterial genome?
I finally had something really good to tell my backers. I posted an update on Kickstarter:
“Still early, but potentially exciting news! Possible eukaryotic brain protein found spliced among bacterial sequences. Interestingly, GC content and codon usage are consistent over this stretch, indicating these genes were combined long ago (when new genes are acquired they gradually take on the characteristics of the new genome). The brain protein is most closely related to myelin PLP in shrimp. Incidentally, the current record holder for fastest neuronal signal is the Kuruma shrimp, clocking in above 200 m/s (about twice as fast as ours).”
I still hadn’t received a response from G. Poisson. There’s something fishy about that guy, I thought, but as long as his money’s green. However, several of the minor backers posted comments:
“Blessings upon you and your work! Soon we will know the Mind of our All-Mother, and we will bathe in the cleansing Waters of her Wisdom.”
“Dr. Lipschitz, you have shed light upon the darkest abyss. But beware, ‘for in these rays we are able to be seen as well as to see.’”
Great, my project was being followed by tripped-out Gaia and/or Cthulhu devotees. I had to snicker at the last one. It was a quote from one of my favorite Lovecraft stories, From Beyond. I started mentally riffing on these themes, “That Lipschitz should ever have studied science and philosophy was a mistake. These things should be left to the frigid and impersonal investigator, for they offer two equally tragic alternatives to the man of feeling and action; despair if he fail in his quest, and terrors unutterable and unimaginable if he succeed.” I was feeling giddy. “Watery Wisdom” is good, though personally I might have gone with “Briny Brain.” We will Cavort in the Crustacean Crenulations of Her Briny Brain. Ah, Godbless’em. Good people. They gave me money. They believe in my work. It’s wrong for me to ridicule them. Hee hee hee.