Publications
2024
Evolutionary Algorithms Simulating Molecular Evolution: A New Field Proposal
James S. L. Browning, Daniel Tauritz, John Beckmann
The genetic blueprint for the essential functions of life is encoded in DNA, which is translated into proteins -- the engines driving most of our metabolic processes. Recent advancements in genome sequencing have unveiled a vast diversity of protein families, but compared to the massive search space of all possible amino acid sequences, the set of known functional families is minimal. One could say nature has a limited protein "vocabulary." The major question for computational biologists, therefore, is whether this vocabulary can be expanded to include useful proteins that went extinct long ago, or maybe never evolved in the first place. We outline a computational approach to solving this problem. By merging evolutionary algorithms, machine learning (ML), and bioinformatics, we can facilitate the development of completely novel proteins which have never existed before. We envision this work forming a new sub-field of computational evolution we dub evolutionary algorithms simulating molecular evolution (EASME).
2023
Modeling emergence of Wolbachia toxin-antidote protein functions with an evolutionary algorithm
John Beckmann, Joe Gillespie, Daniel Tauritz
Evolutionary algorithms (EAs) simulate Darwinian evolution and adeptly mimic natural evolution. Most EA applications in biology encode high levels of abstraction in top-down population ecology models. In contrast, our research merges protein alignment algorithms from bioinformatics into codon based EAs that simulate molecular protein string evolution from the bottom up. We apply our EA to reconcile a problem in the field of Wolbachia induced cytoplasmic incompatibility (CI). Wolbachia is a microbial endosymbiont that lives inside insect cells. CI is conditional insect sterility that operates as a toxin antidote (TA) system. Although, CI exhibits complex phenotypes not fully explained under a single discrete model. We instantiate in-silico genes that control CI, CI factors (cifs), as strings within the EA chromosome. We monitor the evolution of their enzymatic activity, binding, and cellular localization by applying selective pressure on their primary amino acid strings. Our model helps rationalize why two distinct mechanisms of CI induction might coexist in nature. We find that nuclear localization signals (NLS) and Type IV secretion system signals (T4SS) are of low complexity and evolve fast, whereas binding interactions have intermediate complexity, and enzymatic activity is the most complex. Our model predicts that as ancestral TA systems evolve into eukaryotic CI systems, the placement of NLS or T4SS signals can stochastically vary, imparting effects that might impact CI induction mechanics. Our model highlights how preconditions and sequence length can bias evolution of cifs toward one mechanism or another.
Frontiers in Microbiology, Volume 14 (2023)