To the molecular biologists:
Optimize ye codons while ye may
For time is a-flying
And this clone you have in R & D today
Tomorrow will be … in manufacturing- and it’s just impossible to change anything at that point, so forget it.
Codon Usage Bias – Part I
I read an article yesterday about codon bias that has been stuck in my head ever since. The article, appeared as a ‘Perspectives’ piece in the 13 Dec 2014 issue of Science, with the title, ‘The Hidden Codes That Shape Protein Evolution.’
This article addresses some details not often considered in how DNA directs the synthesis of proteins.
I spend a lot of time in my classes discussing the basic mechanism by which DNA –>RNA –> Protein, known as the Central Dogma. A lot gets left out of these lectures in order to keep it simple, which sometimes keeps the way I think about the flow of information pretty simplified as well.
Fortunately, this article rattled my cage enough to open my mind to the myriad influences that go into the stuff of life. Here, Weatheritt and Babu, look at how DNA sequences may be under selective pressures independent of just the proteins they encode.
I’ve done a fair amount of molecular biology in my life, including cloning genes and moving them into other organisms for expression as drugs or drug components. One example of this was in a lab where we used live-vectors as immunogens in order to take advantage of the uniquely broad immune response this single-cell pathogen elicits. The immune responses we wanted to trigger / amplify were typically against human tumor proteins or the products of human viruses (e.g. HIV, HPV), however the organism we were using as a vaccine was a bacteria.
As I said above, I usually teach the Central Dogma in a way that omits many of the complications seen in the real world. So, when we look at a codon chart, we see the redundancy (multiple DNA codons make the same amino acid) to illustrate how a change in the DNA sequence can often fail to change the protein sequence at all. These are called ‘silent mutations’.
The way these codon charts work is by triangulating a position in the middle of the chart using the bases depicted along three of the four edges. For example, the codon AUG is read by locating the ‘A’ on the left margin, the ‘U’ on the top margin, and the ‘G’ on the right margin. The location this identifies is an amino acid called Methionine (abbreviated as met) on this chart.
Notice that if the first two bases in a codon are CU, then it does not matter what the third base is, no matter what, this codon will call for a Leucine (leu).. This means is the sequence of RNA is CUU originally, but mutates to CUC, there will be no change in the protein.
What codon optimization addresses is the fact that different organisms tend to prefer some codons over others, even if they encode the same amino acid. This has been appreciated for many years now so when a molecular biologist takes a protein (e.g. from a human tumor) that they want made by bacteria and they redesign the DNA sequence in a way that codon preferences are maximized in the organism that will express the protein.
This figure examines the percentage of times a gene uses a particular codon to make Leucine. In the bacteria, E. coli, CTG is used nearly 50% of the time. Meanwhile, in the yeast, S. cerevisea, TTA and TTG are preferred.
(Note that T in DNA = U in RNA)
So, what does this mean? Consider a simplified example…
I want to clone this protein from a yeast and grow it up in bacteria:
Met – Leu – Leu [stop]
ATG- TTA – TTG – TAG
We would take this DNA from the yeast and then modify the sequence by changing the two Leucine codons into the preferred sequence in bacteria (CTG):
Met – Leu – Leu [stop]
ATG- CTG – CTG – TAG
The result should be a sequence of DNA that the bacteria will be able to optimally translate into protein.
This has worked out to be much longer and more technical than I intended – and I haven’t even addressed the new ideas brought up in the Science article.
Therefore, I’m going to stop here and continue tomorrow with part II