Searching for the key away from the lamppost: Exploring genomic dark matter in the darkest part of the world.
We live on a placid island of ignorance in the midst of black seas of infinity – H.P. Lovecraft
Intelligence is a highly adaptive trait. In the case of humans, it has granted otherwise mediocre creatures dominion over the Earth’s surface. Given 3.5 billion years of life on this planet, even just the 500 million years or so since the Cambrian explosion, advanced intelligence is likely to have evolved independently more than once. Evolution has produced ever-increasing complexity, punctuated by extinctions. A complex organism that has survived multiple extinction cycles could be orders of magnitude more complex than the most complex animal that arose recently. Advanced complexity allows for advanced intelligence. For example, a slime mold, resembling a mindless blob of protoplasm lacking specialized structures or organ systems, can solve complex mathematical/topological puzzles purely by the emergent properties of its network (for example creating a model Tokyo subway system that was as good or better than the original). Human intelligence is a product of the trillions of synaptic connections among the brain’s neurons. There are an estimated 1030 microbial cells on the planet [1]. If some small fraction of these were properly networked, their collective intelligence would exceed ours to such an extent that it would seem godlike from our perspective.
Advanced intelligence allows skillful perception and manipulation of the world. Therefore, the hypothesized super-intelligent non-human lifeform is highly successful, and so should be widespread. Because humans have not yet discovered this widespread, highly intelligent lifeform, it must be well hidden. It is most likely to live in the least explored part of the planet, the abyssal or hadal zones of the deep ocean, and to have biological signatures that are not readily recognizable. The technology of this species must be similarly hard to recognize or detect. Therefore, its artifacts are subtle and almost indistinguishable from natural features. Environmental conditions in the abyssal zone are constant, and so an organism adapted to this environment would require no artificial shelters. Abundant energy is available in the form of organic matter snowing down from the upper layers, through geothermal energy in certain areas and from oxidation of Fe-rich basalts in the oceanic crust. Purely biological solutions exist to all aspects of life in this zone. However, to survive the cataclysms that occur at epochal time scales, an ancient, advanced intelligence would require an ability to sense and respond to its environment in innovative ways. A completely genetic technology would be highly effective and difficult to detect unless specifically searched for. Hence, a key piece of evidence for the existence of this hypothesized lifeform, and the central clue to the nature of its biology, is that it has not been discovered.
Dear reader, does this sounds preposterous? Consider the seemingly preposterous aspects of the world that you now take for fact. Our minds struggle with vast scopes of space and time, but look at the wonders that evolution has wrought given incremental changes over billions of years. This is the reason the uneducated often cannot accept evolution: the complexity that arises from simple, local processes seem fantastical to those who fail to consider the combinatorial power of countless molecules operating over deep time. To accept the wonder of our existence, the incomparable beauty and complexity of nature, and then to scoff at the idea that we are not the sole advanced intelligence in this world, is an act of profound contradiction.
In fact, a being dwelling in a constant environment with abundant metabolic energy sources available through direct adsorption into its tissues could dispense with the excessive corporeal baggage to which we primates have become accustomed. Without the need to support itself on land, move about and hunt, there would be no need for our complex anatomy. Such a being could be almost entirely brain. A human brain is composed of about 100 billion cells, each with about 1000 interconnections on average. This capacity could be equaled by a network of bacteria-like cells in a small blob of marine sediment. Abyssal plains comprise more than half of the Earth’s surface area. The computational power of a vast, highly connected Abyssal Ganglia could dwarf that of the human mind and our computers.
Such a being dwelling in the abyssal zone could easily be missed by the patchy surveys of this vast area, especially if it harbors the innocuous form of a vast but dispersed colonial microorganism. Its genetic signatures could be similarly cryptic, given the general difficulty of classifying highly novel sequences in databases. If, as I hypothesize, this organism has evolved tools for genetic engineering and has developed a purely biology-based technology, its molecular signal would be far too complex and variable to recognize with current bioinformatics approaches. However, the paradoxical hyperdiversity associated with the deep sea [2] may provide indirect evidence for such a technology. Could this diversity be an indicator of willful genetic experimentation? The Candidate Phyla Radiation (CPR) is a diverse cluster of marine bacterial phyla presumably symbiotically associated with unknown marine hosts [3]. Many of these phyla could represent the carefully curated components of the Abyssal Ganglia’s machinery for capturing energy from the environment. Furthermore, the novel “fourth domain” gene families found in marine DNA could represent this organism, or the impacts of its technology [4].
Of course, diversity is the very hallmark of biology, and the above patterns, while intriguing, can be explained without invoking intervention by an advanced intelligence. Furthermore, these observations were based on extensions of mundane biology, for example surveys of ribosomal RNA genes or analysis of conserved gene families that reveal novel variations on familiar themes. What might we find in the abyss if we searched for an entirely outré form of biology? (A lifeform completely alien to us, though in some ways more fundamental to the Earth than humans, spanning the planet when our ancestral cells had scarcely become multicellular).
Modern DNA sequencing platforms are churning out petabases (1015 nucelotides) per year of new sequences [5]. Though much of this is devoted to the vain and redundant task of glorifying the human genome and our excreta (also known as the Human Microbiome Project), massive amounts of environmental metagenomic data are also generated from soils, sea water, ocean sediments, animal feces, and the slimy, microbe-coated surfaces of myriad creatures. These samples show us that the microbial diversity of the planet is, for all practical purposes, inexhaustible. Easily half of this DNA is unrecognizable, and relegated to the classification of “genomic dark matter,” an allusion to the matter that makes up the majority of the universe and whose gravitation is largely responsible for holding together spinning galaxies against their centrifugal force, but is invisible and uncharacterized. Routinely, researchers dwell solely in the well-lit segment of this pie chart, where homologies to known sequences allow the discovery of marginally novel species, and the tracking of known species across space and time. This is the proverbial “looking for the key beneath the lamppost.” But the secrets I seek do not lie here. Evidence for my hypothesized life form, the Abyssal Ganglia, will only be found in the genomic dark matter of the darkest, least explored regions of the planet.
Large international efforts, such as the Census of Diversity of Abyssal Marine Life (CeDAMar), the International Ocean Discovery Program (IODP) and the World Register of Marine Species (WoRMS) [6], have begun to collect water, sediment and subseafloor crust samples from the deep sea. Metagenomic sequences for some of these samples already exist, and many more samples have been archived, and may be requested by serious researchers. Therefore, my initial exploration of this eldritch abyss will be virtual. In this proposal, I enumerate hypothesized properties of the Abyssal Ganglia, the genetic signatures yielded by these properties and bioinformatic strategies for detecting these signatures.
Phase 1 of the research will focus on preexisting abyssal zone metagenomes to optimize the search parameters for Abyssal Ganglia signatures. Phase 2 will expand the search for these signatures and create a spatially-explicit model for the distribution of this organism. Phase 3 will involve hijacking a deep-sea submersible and confronting the central hub of this Leviathan in the murky depths of the abyss.
I hypothesize that the Abyssal Ganglia is ancient, highly intelligent and exceedingly well-adapted to its environment, and will therefore be widespread in the abyssal environment. Having thus far escaped detection, its genetic signature will lie within the realm of unidentified sequences. Therefore, I will search for recurring patterns that are common in the abyssal zone, are not found elsewhere and are unrelated to any known sequence. These patterns may be identified in DNA sequences, or in higher order geometric or logical patterns that arise from non-conventional analyses of DNA sequences.
As the Abyssal Ganglia is hypothesized to be highly intelligent yet no artifacts of its technology have been discovered, it is hypothesized to have a purely biological technology, which produces no conspicuous artificial structures but leaves its signature in DNA sequences. One consequence of this technology would be an abundance of chimeric genes or pathways, which show signs of splicing from distantly related organisms. These will be detected using modifications of existing software that search for chimeric sequences that arise during errors in DNA sequencing, except the hypothesized sequences will arise identically multiple times in independent samples. Gene splicing will also be detected as discontinuities in nucleotide and dinucleotide signatures [7], reflecting regions from divergent source genomes. These approaches will require long assemblies of sequence reads that lack a known genomic scaffold. This challenge will be overcome using k-mer frequency binning methods to assemble non-overlapping contigs without relying on homology to sequenced genomes [8].
The second consequence of a highly intelligent organism with advanced DNA editing technology could be non-conventional signals encoded in DNA sequences. This approach borrows from the Search for Extraterrestrial Intelligence (SETI) that have developed algorithms for detecting “beacons” (intentionally transmitted information) by analysis of information content in astronomical data [9]. For example, the Abyssal Ganglia could back up its network structure as a DNA sequence. It is hypothesized that this organism stores information in the combinatorial connections among its cells, analogously to the human brain. It would be highly advantageous to store a coded image of the network structure in case of catastrophic loss. Such a mind could also have other uses for expressing numbers and mathematical expressions, which could also be stored or transmitted in DNA form. Sequences derived from an organized array of non-random numbers would have lower entropy values than typical sequences that code for ribosomal RNA or proteins. Based on my models of numerical information storage in DNA, highly regular sequences could even be distinguished from known repetitive sequences found in genomes (Figure 1).




Having identified candidate sequences for numerical (or alphabetical) content, specific decryption algorithms can be deployed. The identification of such codes by relatively low entropy content implies that they are not securely encoded, or else such signals would be indistinguishable from random noise. Therefore, there are a finite number of logical numerical systems based on the genetic code (i.e., base 4 using A, C, G and T nucleotides to represent 0, 1, 2 and 3). It would be a straightforward task to search for known numerical sequences (e.g., the integers, pi, the Fibonacci series, the primes) by trying permutations of base codes (mappings between {A,C,G,T} and {0,1,2,3}) and word length (number of bases used to represent each number). A simple exercise with this approach shows no exact matches greater than 24 bases to these numerical sequences in GenBank (Table 1). This is not surprising, as this database contains only the genomes of known organisms and environmental surveys of known genes. A long numerical sequence found in metagenomic dark matter of the abyss would clearly indicate an encoded signal with an astronomically low chance of arising randomly.

Calculating the information content of DNA sequences could indicate the presence of intentional signals. However, a deeper metric of intelligent life would be the computational complexity of this system. Just as the information content is related to the number of possible discrete states that can be recognized in a message, the computational complexity arises from the number of distinct internal states of the computational system. Life, and in fact all of nature, can be thought of as a Turing Machine that reads data from its current state, performs operations on these data and outputs a new state. In other words, the life machine senses conditions and responds to them, causing the system to evolve, or as eloquently put by Crutchfield (2012), “We posit that any system is a channel that communicates its past to its future through its present.” The number of possible ways a system could respond to a set of inputs gives rise to computational complexity. Just as most DNA sequences tend to have intermediate information content, balancing between randomness and predictability, the computational complexity of biological Turing Machines appears to be balanced between chaos and order [10]. A complete description of the computational complexity of the Abyssal Ganglia requires a full understanding of its natural history and biology, requiring not just DNA sequences, but how the entire system changes over time (including, but not limited to, gene expression). However, a more limited and feasible approach based on available data is to analyze DNA sequences as complex dynamic systems, inferring new states that can arise from previous ones, linearly, as a sequence is read from 5’ to 3’ end, or non-linearly, among clusters of related sequences found in the same sample. This approach could reveal higher order patterns not detected by the other information theory approaches described above.
These preliminary analyses will allow the optimization of search methods and identify potential signals of intelligent life in the abyssal zone. This search will then be expanded to include samples that have been collected and archived, but which do not have adequate DNA sequence coverage. Samples will be requested and sequenced deeply. However, only a tiny fraction of this environment has been explored, and particularly large gaps exist for the central Pacific and the Southern Ocean. To quote Stuart et al. (2008), “Attempting to understand abyssal biodiversity from existing data would be tantamount to characterizing terrestrial plant diversity based on 1600 random snapshots of plant life in, for example, only continents of the eastern hemisphere.” Therefore, the second phase of research will be to collect new samples based on spatial patterns descried from existing data. In the absence of a clear clustering of positive signals in the initial data set, new samples will be collected according to a golden spiral search pattern centered on locations with the clearest positive signals from phase 1 (Figure 2). Data from this mission will be used reiteratively to design new sampling transects to identify the epicenter of these signals, and ultimately, to locate a central hub of the Abyssal Ganglia. The experimental details of phase 3, in which a deep-sea submersible is commandeered by a small ragtag band of free-thinking scientists against all odds, is beyond the scope of this current proposal.
