Evolutionary Models of the 
Origin of Protein-Based Life

Michael A. Barber
September 2025

Abstract

 

The origin of protein-based life through undirected orthodox evolutionary processes has long been posited as a product of stochastic events over geological timescales. This paper challenges this paradigm by examining the constraints imposed by (1) amino acid sequencing, (2) three-dimensional protein folding, (3) functional component interdependence, (4) cellular recognition of the success of the completed protein design, and (5) the conversion of the successful build mechanism into the requisite instructional DNA code. 

 

Drawing on empirical data from biochemistry and computational microbiology, we quantify the improbability of random component sequencing, highlighting odds that far exceed the estimated number of physical events that could have occurred since the Big Bang. We further explore the challenges of protein folding, as exemplified by the AlphaFold project, as well as the necessity for simultaneous functionality among interdependent proteins and other systems. These factors collectively infer that undirected mechanisms alone are insufficient to account for the emergence of the collective success of functioning proteomes, prompting re-evaluation of abiogenesis models. While acknowledging counter-arguments from orthodox evolutionary biology, we argue that these improbabilities and the associated arguments in this document warrant consideration of alternative explanatory frameworks.

 

 

Introduction

 

The emergence of life on earth remains one of the most profound questions in science. Conventional evolutionary theory posits that life arose through abiogenesis — i.e. undirected chemical evolution and natural selection leading to self-replicating and self-sustaining systems. However, the conventional view has been critiqued partly on probabilistic grounds, with arguments suggesting that the random formation of complex biomolecules, such as the genetic coding for proteins, is statistically implausible by considerable amounts for each of the multiple protein types; however, this paper interrogates the orthodox position by integrating data from foundational sciences, including microbiology, biochemistry, physics, computational modelling, and additional considerations related to these arguments, to assess the likelihood of protein-based life arising by undirected means.

 

Proteins, the workhorses of cellular function, are polymers of amino acids whose sequences and structures underpin biological processes. The human proteome consistency approximations establish that there are somewhere between 20,000 and 100,000 types, with some biologists suggesting this could reach into the millions, each of which are vital to the body’s efficiency and subsistence, with considerations for many of these proteins proving subject to interdependency. We begin by quantifying the combinatorial space for amino acid sequences, comparing each to vast probabilities, extending the analysis to protein folding, cellular orchestration, and completed protein build inter-functionality. 

 

While evolutionary models invoke natural selection and vast timescales to mitigate these, we contend that such mechanisms fail to adequately address the concurrent requirements for comprehensive integration.

 

Probabilistic Analysis of Protein Sequence Assembly

 

The universe's age is estimated at 13.8 billion years, during which the total number of possible physical events — accounting for subatomic interactions across cosmic scales — has been approximated as 10^150. This figure, while immense, pales in comparison to the combinatorial possibilities for even modest protein sequences. Some of the largest proteins are outlined in the table below.

Protein

Titin

Mucin-16 (CA125)

Obscurin

Nebulin

Dystrophin

Filamin A

Approx # Amino Acids

30,000                  
22,000                  

7,900                  

6,600                  

3,600                  

2,600                  

Odds of Arrival by Chance*

                 1 in 10^44,000

                 1 in 10^28,000

                 1 in 10^10,300

                 1 in 10^8,600

                 1 in 10^4,700

                 1 in 10^3,400

 

The Complexity of Protein Folding

 

Protein building extends beyond amino acid selectional sequencing to three-dimensional structure, achieved through folding — a process of immense computational complexity. Linear polypeptides must adopt precise conformations; a challenge dubbed the “protein folding problem.”

 

The AlphaFold initiative, involving over two million researchers, along with computer programmers and advanced AI systems, has made strides in predicting folds; however, this enterprise underscores the inherent difficulty: nature accomplishes protein folding instantaneously, quintillions of times per second in our cells. Evolutionary models propose gradual modifications, yet pre-folded sequences are inert, raising unanswered questions: Why select non-functional precursors? How could random processes “anticipate” functional outcomes and cease re-modifications upon success?

 

Folding amplifies assembly improbability, as only a minuscule fraction of attempted sequences yield stable, functional structures. Encoding these mechanisms into DNA adds further layers of eristic explanations, converting transient processes into heritable code through processes that are undisclosed.

 

 

Cellular Interdependence and Orchestration

 

This article explores the landscape of functional human proteins and their interdependencies, with a particular focus on complexes where the omission of one subunit without its partners would severely compromise cellular operational capability. Drawing from established biological principles, databases, and research, we delve into definitions, mechanisms, examples, and implications.

 

Cells are not mere aggregates, but coherent systems where proteins interact with enzymes, histones, ribosomes, endosomes, melanosomes, chromatin, lysosomes, and other components. This “molecular metropolis” requires extraordinarily precise orchestration, defying reductionist explanations. The proteome's scale — potentially millions of variants — coupled with accurate amino acid sequence selection, folding complexity, and interdependence requirements, seriously challenge conventional explanations.

 

Laboratory replication of such systems remain elusive, supporting arguments that abiogenesis probabilities are prohibitively low. Rebuttals emphasise chemical affinities and prebiotic environments, but empirical evidence for self-organising mechanisms at these stages is not forthcoming.

 

  • Research indicates that in human cells, many proteins form obligate complexes where subunits rely on each other for stability and function, meaning the removal of one subunit can destabilise the entire assembly and impair critical cellular processes such as transcription, translation, or degradation.
  • Common examples include the ribosome (essential for protein synthesis), proteasome (for protein degradation), and RNA polymerase II (for mRNA transcription), where subunit interdependencies arise from shared interfaces and complex unities.
  • Evidence leans toward these interdependencies being vital for cell viability, as mismatches or omissions often lead to diseases like anaemias or cancers, though the exact impact can vary by complex and context.

 

Understanding Protein Complexes

 

Protein complexes are stable assemblies of two or more proteins that interact to perform specific functions in human cells. These can be obligate (subunits cannot function independently) or non-obligate (subunits can exist alone but interact transiently). In obligate cases, subunits often co-fold or stabilise each other through large hydrophobic interfaces, making them interdependent.

 

 

Why Interdependence Matters

 

Omission of one subunit without the others can be detrimental because it disrupts assembly, leading to loss of function, protein mis-folding, or cellular stress. For instance, in essential complexes, this can halt key pathways, causing lethality or disease. Studies show that subunits in such complexes are required to co-exist, so removing one breaks compensatory interactions built over time.

 

Evolutionary biologists refer to the machinery that provides subunit support within cells as if this reduces the viability of the odds-based calculations in the afore-mentioned table. In contrast, however, this behaviour supports the view that simultaneous emergence of multiple proteins and subunits is essential to the overall functioning of the cell due to the need for interoperability, thus reinforcing the viability of these odds.

 

 

Notable Examples

 

Ribosome: Composed of ribosomal proteins and rRNAs; removal of a key protein like RPL4 prevents translation, as subunits must assemble precisely for peptide bond formation.

Proteasome: Involves core and regulatory subunits; loss of a core subunit blocks assembly, impairing protein degradation and leading to toxic buildup.

RNA Polymerase II: 12 subunits (e.g., RPB1-RPB12); omitting one disrupts transcription initiation, as they cooperate for DNA binding and elongation.
 

In human cells, proteins rarely operate in isolation; instead, they form intricate networks of interactions that underpin virtually all cellular processes, from metabolism and signalling to structural maintenance and response to environmental cues. Among these, protein-protein interactions (PPIs) are fundamental, often manifesting as multi-subunit protein complexes where individual proteins, or subunits, collaborate to achieve functions that none could perform alone. 

 

 

Protein Complexes and Interdependencies

 

A protein complex is defined as a stable assembly of two or more polypeptide chains bound by non-covalent interactions (the weak interactions within proteins that do not share electrons), forming a quaternary structure distinct from single multi-domain proteins. In humans, these complexes range from simple dimers to massive assemblies like the ribosome, which includes over 80 proteins

 

Obligate interdependencies require all subunits for stability and function; individual subunits often fail to fold properly or to remain stable in isolation, relying on mutual stabilisation through extensive interfaces (typically >2500 Ų of buried surface area). Non-obligate interactions, while important, allow subunits to exist independently but may still lead to functional deficits upon separation.

 

The Omission of One Subunit

 

The detriment from omitting one subunit stems from several factors:

 

Structural Disruption: Subunits often share binding interfaces where amino acid residues are required to co-exist. A mutation or removal in one can destabilise the entire complex.

Functional Loss: Complexes perform modular tasks; e.g., in enzymatic machines, one subunit might bind substrate while another catalyses reaction — removing one halts the process.

Cellular Consequences: In essential complexes, this can trigger proteotoxic stress, apoptosis, or disease. For instance, hybrid incompatibilities in model organisms show that mismatched subunits from divergent lineages fail to assemble, leading to viability issues.
 

Databases such as CORUM (Comprehensive Resource of Mammalian Protein Complexes) provide manually curated data on over 7,193 human complexes involving 5,299 unique genes, highlighting their roles in core processes. Other resources, such as STRING for PPIs or BioGRID for interactions, complement this by mapping networks where interdependencies are inferred from genetic and physical data.

 

 

Key Examples of Interdependent Protein Complexes

 

Below we highlight examples of human protein complexes where subunits exhibit strong interdependencies. These are drawn from well-studied cases in literature and databases, illustrating how omission of a single subunit can cascade into cellular dysfunction. — (See below to view this chart as an image: Note that use of this chart must always be accompanied by the copyright notice in the last row.)

 

| Complex Name | Key Subunits (Examples) | Function | Why Omission of One Subunit is Detrimental |

|--------------|--------------------------|----------|--------------------------------------------|

| Ribosome (80S) | RPL4, RPS6, RPL11 (large subunit); RPS18, RPS24 (small subunit); plus rRNAs | Protein synthesis via translation of mRNA | Subunits are obligate; removal (e.g., RPL4) prevents assembly of the peptidyl transferase center, halting peptide bond formation and causing translational arrest, often lethal in cells. Mutations link to diseases like Diamond-Blackfan anemia. |

 

| Proteasome (26S) | PSMA1-PSMA7 (alpha ring); PSMB1-PSMB7 (beta ring); regulatory subunits like PSMD1 | Ubiquitin-mediated protein degradation | Core subunits form a barrel structure; omitting one (e.g., PSMB5) blocks chamber formation, preventing proteolysis and leading to toxic protein accumulation, associated with neurodegeneration. |

 

| RNA Polymerase II | RPB1 (largest subunit), RPB2, RPB3-RPB12 | Transcription of mRNA and non-coding RNAs | Subunits cooperate for promoter recognition and elongation; removal of RPB1 disrupts the active site, stopping transcription and gene expression, linked to developmental disorders. |

 

| Hemoglobin | HBA1/HBA2 (alpha chains), HBB (beta chains) | Oxygen transport in blood | Obligate tetramer (α2β2); omitting beta chains causes alpha chain precipitation, leading to thalassemia and anemia due to impaired oxygen binding. |

 

| Tubulin Heterodimer (Microtubules) | TUBA1A (alpha), TUBB2A (beta) | Cytoskeletal structure, intracellular transport | Obligate dimer polymerises into microtubules; removal of beta-tubulin prevents polymerisation, disrupting mitosis and neuronal migration, causing brain malformations like lissencephaly. |

 

| ATP Synthase (F1-F0) | ATP5A1-ATP5E (F1 catalytic core), MT-ATP6/MT-ATP8 (F0 proton channel) | ATP production in mitochondria | Subunits form rotary motor; omitting MT-ATP6 impairs proton flow, reducing ATP synthesis and causing mitochondrial diseases like NARP syndrome. |

 

| Cytochrome c Oxidase (Complex IV) | COX1-COX3 (mitochondrial), COX4-COX8 (nuclear) | Electron transport chain, respiration | Obligate assembly; loss of COX1 destabilises the complex, blocking electron transfer and oxygen reduction, leading to energy deficits and diseases like Leigh syndrome. |

 

| Nuclear Pore Complex | NUPs (e.g., NUP107, NUP153, NUP205) | Nuclear-cytoplasmic transport | Large assembly with co-evolved subunits; omitting NUP107 prevents pore formation, disrupting RNA export and protein import, potentially lethal or causing sterility in hybrids. |

 

| Voltage-Gated Potassium Channel | KCNA1-KCNA4 (alpha subunits), auxiliary betas | Membrane potential regulation, neuronal signaling | Heterotetramer; removal of one alpha subunit impairs pore formation and ion selectivity, leading to epilepsy or ataxia. |

 

| Connexon (Gap Junction) | GJA1 (connexin 43) hexamers | Intercellular communication via channels | Homomultimeric; omitting subunits disrupts hexamer assembly, blocking ion/metabolite passage and causing arrhythmias or deafness. |

 

| RARA-RXRA Complex | RARA (retinoic acid receptor alpha), RXRA (retinoid X receptor alpha) | Gene regulation via retinoic acid signalling | Heterodimer binds DNA; omission of RXRA prevents dimerisation and transcription activation, linked to developmental defects and cancers. |

 

| AR-ESR1 Complex | AR (androgen receptor), ESR1 (oestrogen receptor alpha) | Hormone signalling in cells | Heterodimer modulates gene expression; removing ESR1 reduces AR activity, affecting processes like cell proliferation, with implications in hormone-related cancers. |

 

| Rad51-Rad55-Rad57 Complex | RAD51, RAD55, RAD57 | DNA repair via homologous recombination | Rad55-Rad57 stabilises Rad51 filaments; omitting Rad55/Rad57 weakens repair, increasing mutation rates and cancer risk (e.g., Fanconi anaemia). |

 

| Cathepsin D | CTSD light and heavy chains | Lysosomal proteolysis | Obligate heterodimer; separation prevents activation, impairing protein breakdown and causing neuronal ceroid lipofuscinosis. |

 

|||| (c) Michael Barber, September 2025, www.designomics.co.uk |

 

 

Broader Implications and Research Insights

 

Interdependencies extend beyond structure to dynamic regulation. For example, in mitochondrial respiratory chains, subunits like those in complex IV (cytochrome c oxidase) are encoded by both nuclear and mitochondrial genomes, requiring precise coordination; mismatches lead to oxidative stress and diseases. Similarly, in signalling, nuclear receptors like RARA-RXRA form heterodimers modulated by ligands, where drug interventions (e.g., retinoic acid agonists) can stabilise or disrupt assembly, offering therapeutic avenues but also risks of side effects.

 

Studies on hybrid incompatibilities reveal that complexes like the nuclear pore or RNA polymerase II, despite conservation, harbour subunits that cause assembly failures in inter-species crosses. Large-scale proteomic efforts, such as those identifying 622 soluble complexes with 3,006 proteins, show that smaller complexes (<5 subunits) are often vertebrate-specific and un-annotated, with high interdependence vulnerability.

 

In disease contexts, mutations at interfaces (e.g. in tubulin) enrich for pathologies, as seen in brain malformations. Databases like CORUM integrate drug targets, revealing 1,975 instances where pharmaceuticals affect complex formation — e.g., synergistic effects of RXR and PPARG agonists on gene expression via complexes.

 

Overall, these interdependencies ensure efficient cellular operation but pose risks; future research may leverage AI-driven structure prediction (e.g. AlphaFold) to map more interfaces and predict detrimental omissions.

 

 

The Challenge of Functional Recognition in Undirected Processes

 

A central conundrum in evolutionary models is the inferred requirement for processes to recognise when a protein originally achieved successful functionality. In human physiology, protein synthesis does not involve iterative reconfigurations post-formation; the cell, using DNA coding, effortlessly produces each protein as a “finished product” with high fidelity. This raises the question: How does an undirected construction process determine that a specific amino acid sequence paired with the folding mechanism constitute a successful and viable functional entity? Where does the required functionality of the protein occur in the evaluation process?

 

During gradual evolution, numerous unsuccessful attempts at protein assembly would occur. Yet, ascertaining success — defined by the protein's ability to fulfil a biological role — implies a feedback mechanism for evaluation that is absent in purely stochastic systems. By what criteria could such a process verify that the selected sequence and subsequent fold adequately perform the target function? Furthermore, how could it consistently provide for the function itself, ensuring alignment with cellular needs, without genetic coding of the successful functionality?

 

A common counter‑argument is that the emergence of functional proteins did not rely on undirected events, but unfolded through “natural,” “holistic” chemical pathways that allegedly could not have produced any arrangement other than the one we observe. However, this defence presupposes the very mechanisms under debate. Explanations of protein formation typically appeal to the cellular machinery that interprets, assembles, and regulates amino‑acid sequences — machinery that is itself encoded in DNA.

 

This is the crucial point: the improbabilities represented in the table shown at the beginning of this article do not merely concern the formation of single proteins. They are inferred from the origin of the specialised coding system, translation apparatus, and regulatory architecture required to build those proteins in the first place. Any argument that invokes existing biological machinery to explain the emergence of that machinery risks circularity. The question therefore requires careful study: How did the system capable of producing such proteins arise before the system itself existed?

 

 

Mechanisms for Stabilising Protein Designs

 

Once functionality is achieved, evolutionary processes must “seal” the design, preventing further modifications that could disrupt efficacy. Genetic fidelity mechanisms — such as proofreading during DNA replication — ensure stability for the active selection process. But how did these stabilising mechanisms work for the original construction of the genetic coding of amino acid selection? As evolution postulates ongoing modifications, what is the procedure (whether empirical or theoretical) that halts this process once success is achieved?

 

We posit that without a teleonomic signal, or a comprehensive multi-system feedback loop — indicating that the configuration meets the required criteria — continued alterations would persist, severely eroding functionality for each component. Empirical studies on protein stability reveal that even minor mutations can abolish function (see table below), underscoring the substantial amount of precision required, especially in consideration of large proteins. Thus, the cessation of modifications implies an implicit recognition of optimality, challenging purely stochastic explanations.

 

 

Discussion

 

The improbabilities outlined — amino acid sequence development, selection, formation and error-checking mechanisms, DNA coding, success recognition, and functional integration with other components — collectively seriously undermine the sufficiency of undirected solutions for the arrival of protein-based life. While selection-monitoring systems mitigate randomness in established systems, coding presupposes functional precursors, creating a bootstrapping, or Catch-22 dilemma. Alternative models, such as directed processes, merit exploration where empirical gaps persist.

 

Limitations include simplified probability models that ignore redundancy and conjectured evolutionary pathways; however, these can be mitigated by considerations of the requirements for coordinated functional systems, and the lack of empirical evidence for viable alternatives, and an absence of mechanisms that could replace stochastic terms. Future research should integrate quantum effects and prebiotic simulations to refine these estimates.

 

 

Conclusion

 

The mathematical absurdities of random amino acid sequence development, compounded by the complexity of the folding mechanism, functional interdependence requirements, holistic recognition of successful protein construction, and the need for successful genetic mechanism encoding, seriously challenge the paradigm of chance-driven abiogenesis. These findings advocate for interdisciplinary re-evaluation, potentially bridging biology with other fundamental scientific disciplines and information theory.

 

 

References

 

- [Identifying direct contacts between protein complex subunits from ...](https://pmc.ncbi.nlm.nih.gov/articles/PMC5638211/)

- [Protein complex - Wikipedia](https://en.wikipedia.org/wiki/Protein_complex)

- [Protein Complexes Form a Basis for Complex Hybrid Incompatibility](https://pmc.ncbi.nlm.nih.gov/articles/PMC7900514/)

- [Human mitochondrial protein complexes revealed by large-scale ...](https://academic.oup.com/bioinformatics/article/38/18/4301/6650275)

- [Large protein complex interfaces have evolved to promote ...](https://elifesciences.org/articles/79602)

- [CORUM in 2024: protein complexes as drug targets](https://academic.oup.com/nar/article/53/D1/D651/7889246)

- [The role of protein complexes in human genetic disease](https://pmc.ncbi.nlm.nih.gov/articles/PMC6635777/)

- [Understanding protein-protein interactions](https://www.abcam.com/en-us/knowledge-center/cell-biology/protein-protein-interactions)

- [Computed structures of core eukaryotic protein complexes](https://www.science.org/doi/10.1126/science.abm4805

 

 

Note: Additional references are embedded via citations and quotations derived from sources specified on the website www.designomics.co.uk.

 

Michael A. Barber
Liverpool, UK

 

Proteins are constructed from 20 standard amino acids, rendering the probability of a specific sequence of length n as 1/20^n (assuming uniform random assembly, a simplifying assumption for baseline improbability). The table presents selected human proteins, their approximate amino acid lengths (based on canonical isoforms), and the corresponding odds expressed in exponential form (10^k, where kn × log_10^20 ≈ n × 1.3010).

 

 

Selected Amino Acids and their Sequence Improbabilities

 

The lengths included in the above table are derived from UniProt and literature sources. Odds are logarithmic approximations and do not account for functional redundancy or conjectured pathways, which critics argue could somewhat reduce effective improbability.

 

These values exceed 10^150 by many orders of magnitude, rendering the notion of random sequence development implausible, especially within cosmic timescales. For instance, assembling any single molecule from the above list, in reference to undirected mechanisms, would require computational resources beyond current capabilities, far surpassing the universe's age in processing time. Moreover, proteins do not function in isolation; human cells demand the assembly of trillions of such inter-functioning molecules continuously, compounding the improbabilities of discovery of the original amino acid sequence formation by multiples of exponential values.

 

Critics of probability-based arguments counter that evolution operates via incremental selection, not pure randomness, and that functional proteins can emerge from smaller precursors. However, we are not discussing routine protein assembly, but sequence originality. The orthodox view overlooks the interdependence of many proteins: their mutual reliance, necessitating simultaneity for cellular viability and successful organism functionality. Geological timescales provide no resolution, as partial assemblies offer no selective advantage.

*Note: The probabilities discussed here concern the origin of amino acid sequences as encoded in DNA. Inside a living cell, the machinery that selects and links amino acids into a growing polypeptide chain does not operate by random trial for each protein; it follows instructions already embedded in the genome: the codon sequence in DNA specifies exactly which amino acid is added at each step, and this is performed with the assistance of cellular machinery (e.g. Aminoacyl‑tRNA Synthetases, ribosomes). This is not what the odds in the chart are based on. The cell’s translation system relies on information that is already present. The improbabilities in the above table arise, not from the routine, device-assisted action of building individual proteins, but from the original amino acid sequences and machinery coding that were written into DNA in the first place.

Note that use of this chart in any form of publication must be accompanied by the copyright notice in the last row.

We need your consent to load the translations

We use a third-party service to translate the website content that may collect data about your activity. Please review the details in the privacy policy and accept the service to view the translations.