Researchers try to cut the genetic code from 20 to 19 amino acids

researchers-try-to-cut-the-genetic-code-from-20-to-19-amino-acids
Researchers try to cut the genetic code from 20 to 19 amino acids

Using AI tools, the team reworked part of the ribosome to need one less amino acid.

The genetic code is central to life. With minor variations, everything uses the same sets of three DNA bases to encode the same 20 amino acids. We have discovered no major exceptions to this, leading researchers to conclude that this code probably dated back to the last common ancestor of all life on Earth. But there has been a lot of informed speculation about how that genetic code initially evolved.

Most hypotheses suggest that earlier forms of life had partial genetic codes and used fewer than 20 amino acids. To test these hypotheses, a team from Columbia and Harvard decided to see if they could get rid of one of the 20 currently in use. And, as a first attempt, they engineered a portion of the ribosome that worked without using an otherwise essential amino acid: isoleucine.

Changing the code

First off, why would you do this? Most work in the field has focused on altering the genetic code in ways that are useful, such as using more than 20 amino acids to enable interesting chemistry.

The reasoning here seems to be that, prior to the last common ancestor of life on Earth, organisms experimented with various genetic codes and probably used a mix of proteins and catalytic RNAs to run their metabolisms. While we’ve done a lot of studies on catalytic RNAs, we have far less of an idea of what sort of chemistry is possible with a reduced genetic code. And the researchers suggest that AI-based tools have matured enough that redesigning proteins to use fewer amino acids is far more realistic than it was just a few years ago.

Isoleucine is one of three highly similar amino acids, along with leucine and valine. In the portion of the structure that’s distinct from other amino acids, all three have a branched structure that’s composed entirely of carbon and hydrogen. That makes them all hydrophobic, and they often are located in the interior of proteins, which keeps them away from the watery environment of the cell. So, purely by reasoning it out, one of those three would seem to be a good candidate to get rid of.

The researchers involved backed that reasoning up with evidence. They ran an analysis of the E. coli genome, checking which amino acids were substituted by other ones in related proteins from other species. Isoleucine was the amino acid that was most frequently swapped out for a different one. So, the researchers decided to start answering the question of whether we really need it at all.

Editing all 4,500 or so genes in E. coli would be a monumental task, and that many changes at once would almost certainly end up killing it, so the researchers started out with much smaller tests. To begin with, they took a set of 36 essential genes and replaced every isoleucine in them with valine, a similar amino acid, and then put the introduced gene back into the genome. For 22 of the genes, doing so killed the cells. But that does indicate that 17 of them got by ok without isoleucine, including one where it was swapped out in 45 different positions along the amino acid chain.

Notably, even in cases where cells tolerated the change, their growth often slowed compared to the unedited cells. That will become a recurring theme.

Redesigning the ribosome

To give their project a focus, the researchers decided to start engineering an isoleucine-free ribosome. The ribosome is a large complex of proteins and RNAs that translates messenger RNAs into proteins—you can think of it as a bit like one of the hardware components that’s needed to boot a living cell from a genome. Obviously, many of the proteins in the ribosome have critical enzymatic activities. But bringing that complex together requires that these proteins interact with each other and RNAs. So, the ribosome provides a stringent test of whether engineering out an amino acid can be tolerated by cells.

As a preliminary test, the team did an isoleucine-to-valine swap for 50 different individual genes that contribute proteins to the ribosome. Eighteen of those worked with no obvious problems, another 19 grew more slowly, and the changes were lethal for the remaining 13 genes. The team then focused on the 32 genes with reduced fitness and adapted deep-learning protein-design software to suggest alternative sequences that did not include isoleucine.

Iterative testing using four different software packages produced alternative protein sequences for 25 of these 32 proteins that eliminated the fitness issues.

For the remaining five, they went back and forced changes at the isoleucine. They then let the software design changes in the amino acids that are physically close to it within the three-dimensional structure of the protein, the idea being that the change in amino acid may disrupt the protein’s structure in a way that other changes in nearby amino acids could compensate for. This led to successful redesigns for four of the five problem proteins.

While these are impressive achievements, testing them individually doesn’t really give the full picture of whether these redesigned proteins can put together a functionally equivalent ribosome. To do that, the researchers decided to remove isoleucine from all of the proteins in the small subunit of the ribosome. This is largely a matter of convenience. The genes for the 21 proteins in the small subunit are all clustered next to each other on a 10,000-base-long stretch of the genome, so the researchers could just replace them all at once.

Thinking small

Using the redesigned proteins from the earlier work, they started replacing ever-larger stretches of the genes along this 10,000-base stretch of DNA. Starting from one side, they replaced 10 genes without any trouble. By the time they got to replacing 17 of the 21, the cells were growing more slowly. Replacing 18 genes at once, however, killed the cells entirely.

So, they started working in from the other direction and found that the changes were tolerated until they hit the same gene identified as problematic when going from the other direction. That gene, called rplW, seems to be the critical holdup. Replacing 20 of the 21 genes and leaving rplW untouched led to cells that not only survived, but grew at about 70 percent the rate of an unmodified E. coli cell.

So, they took a careful look at the changes the software had suggested for rplW. It turns out that the software had compensated for the changes to isoleucines by deleting some small stretches of amino acids nearby. While that apparently worked to get a functional protein, it differed enough that it wouldn’t work in combination with all the other changes.

At this point, the team just brute-forced the issue. They had software packages suggest a number of alternative amino acids for each of the four isoleucine positions in rplW and tested every possible combination of them (16 designs in total). One of these designs was able to complete the isoleucine-free small subunit, with the resulting strain growing about 60 percent as fast as the unedited ones. The cells were grown for 400 generations and typically picked up 20–30 mutations, but none of those restored an isoleucine to any of the ribosomal proteins.

Notably, if you just put this version of rplW back into the genome on its own, the cells die. It’s only tolerated in the context of all the other changes to the ribosome caused by the other redesigned proteins.

Some notes about the AI use

It’s unclear that any of this would have been possible without the heavy use of AI tools. All of the protein design tools were AI-based, and their outputs were checked using AlphaFold 2, the Nobel-winning AI protein structure software. And the authors of the paper highlight a number of cases where the AI software made suggestions that most biologists would have shied away from. These include replacing the structurally flexible, neutral isoleucine with either a charged amino acid or one that’s locked into a rigid structure.

That said, the results also show the limits of working with current AI models, largely because, unlike a human, they can’t really explain the process by which they’re making decisions. For example, some of the models made very different suggestions from each other, which the researchers say implies that they are exploring different regions of the space of possible sequences. But we don’t actually know whether that’s the case, or if each model had mathematical reasons for disliking the other’s suggestions.

That’s one of a number of cases in the paper where the researchers tried to reason backward about what the model was doing based on its output. In at least one case, the software redesigned the entire structural element (an alpha helix) the isoleucine it changed was located in, for reasons they don’t even hazard a guess.

It’s a good reminder that, at the moment, these software packages are tools: they let us do things that would otherwise not be possible, but they don’t actually help us understand all that much. We’re still left to reason through phenomena using the neural networks inside our skulls.

This doesn’t necessarily have to be the case; we could put more emphasis on exposing the inner workings of this software when developing it in order to get some insights into its decision-making process. But for now, I think the emphasis has been (quite reasonably) on getting something that works.

An amazing achievement, but is it useful?

Overall, this is astonishing work. These proteins have to interact with each other, interact with ribosomal RNAs, transfer RNAs, messenger RNAs, the growing proteins the ribosome makes—plus all the normal proteins over on the large subunit. Each of those has had billions of years to evolve the ability to work with each other. The fact that we could make such radical changes to the system over the course of a couple of years is just mind-blowing.

We still don’t know what’s slowing these cells down. It’s possible that the revised ribosome is less accurate, making more defective proteins by putting together amino acid chains with more frequent errors. Or it could be slower catalytically, becoming a bottleneck for cell growth. That’s something we could definitely experiment with, and giving the strain time to evolve might bring its growth rate back up a bit.

Can we use it as a starting place to get to an isoleucine-free genome? I’d rate that as still in the “maybe” category. There are lots of other large protein complexes in the cell, and there may be some that the AI tools struggle with. We’ll see if these labs have time and funding to continue down this path. Still, I’m skeptical that it will tell us much about life before the universal common ancestor, given how much about the rest of the cell has changed in the meantime.

It may, however, prove effective in that regard, in that it could inspire other scientists to think about experiments that might give us a better picture of what cells with a limited genetic code might look like.

Science, 2026. DOI: 10.1126/science.aeb5171 (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

13 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *