General
In 1959, the American biochemist Walter Kauzmann proposed a radical solution to the problem of protein structure. At the time, it was unclear how proteins, the workhorses of the cell, fold into their unique three-dimensional forms.
Every protein is made up of a set of 20 amino acids, rather like beads on a string. The length and order of these amino acid beads dictate how that protein folds into its unique shape. This is important because the shape of a protein is vital to its function. Any disruption to this structure destroys the protein’s ability to do its job. How nature ensures correct protein folding each time remains one of the biggest mysteries in science.
At the heart of the problem is the knowledge that amino acids interact with water in two distinct ways. Some of them, like lysine, love water. These hydrophilic amino acids easily dissolve and mix well with water. And then there are those like tryptophan that don’t like water. These hydrophobic amino acids don’t mix with water and tend to avoid it as much as possible, to the extent that they often clump together to minimise water exposure.
Since about 70% of the cell is made of water, the way the amino acids are arranged and how that arrangement interacts with water molecules is pivotal to how they fold. If a protein contains a stretch of hydrophobic amino acids, they will naturally tend to aggregate, compacting the entire protein in the process.
Sensitive to change
Kauzmann built on this idea and proposed that proteins have a core largely made up of hydrophobic amino acids and a surface made primarily of hydrophilic amino acids.
The theory was proven to be correct in the following decade when scientists began to accurately map protein structures by X-ray crystallography and saw what he predicted was true: the hydrophobic amino acids were often buried in the core, while the hydrophilic ones tended to localise to the surface.
Further research showed that, unlike the surface, the amino acids at the core were also very sensitive to changes. It appeared that even minor modifications in the core could disrupt the protein’s shape and, consequently, function.
Another piece of evidence supporting this line of thought was that the amino acid sequences from the cores of proteins common to different forms of life were remarkably similar. It was reasoned that this was so since nature couldn’t afford to change these without lethal consequences.
But this raised another question. If the effects of a wrong amino acid combination are so drastic, how did nature, while relying on slow, incremental trial and error, manage to find functional protein structures at all?
Even for a modest 60-amino-acid protein core, the number of possible combinations is around 1078, a number comparable to the estimated number of atoms in the known universe. It’s astonishing that evolution was able to navigate such a vast space of possibilities to find the stable, functional sequences not once, but again and again, across the millions of proteins found in life today.
This mystery has finally been put to rest by a team from the Centre for Genomic Regulation in Spain and the Wellcome Sanger Institute in the U.K.
Implications for therapeutic proteins
In a new paper in Science, the team challenged the original assumption that protein cores are sensitive to change by arguing that, of the astronomically high number of combinations of protein cores that are possible, few have been tested. The changes made in those studies were also localised to small regions and didn’t allow for compensating adjustments elsewhere in the protein.
The team proceeded to test this by first generating a library of 78,125 different amino acid combinations across seven locations in the cores of three proteins: the SH3 domain of FYN tyrosine protein kinase from humans, the CI-2A protein from barley, and the CspA from the Escherichia coli bacterium. Then they tested the stability of some of these combinations to assess the impact of the changes they introduced in the protein.
Remarkably, the authors found that while most combinations were indeed detrimental, several remained stable, showing that protein cores are more resilient to change than previously believed. The actual number of stable combinations varied from protein to protein, with the highest being the human SH3-FYN, which showed more than 12,000 different stable core conformations.
The team then fed this data into a machine-learning algorithm to check if, based on their data, they would be able to predict protein core stability based on the amino acid sequence alone. They tested their model on 51,159 natural SH3 sequences across all domains of life that are available in public databases and found that it could accurately predict stability even if the sequences were less than 25% similar with the human SH3.
The study’s results have several important implications for therapeutic protein engineering. Many proteins trigger an unwanted immune reaction when administered due to their amino acid sequence. Changing that amino acid sequence was a slow and painful process, since it was believed that too many changes, especially at the core, would disrupt protein structure. Now, with the new insights, it may be possible to speed up the process by screening larger combinations, with many more changes than were attempted previously.
However, while the study holds clear promise for therapeutic applications, its deeper significance lies in what it means for fundamental biology. The knowledge that the protein core is tolerant to a larger degree is an insight that resonates beyond medicine, and into the very nature of evolution itself. It’s a reminder to us that life, at its deepest level, is far more adaptable than we imagined.
Arun Panchapakesan is an assistant professor at the Y.R. Gaitonde Centre for AIDS Research and Education, Chennai.