Yesterday I got caught up writing about the way that biologists use codon usage bias to optimize cloned genes for expression. Today, I want to finish this up by discussing two ideas about why codon optimization occurs.
#1: Through some stochastic (random) process, one or more tRNAs corresponding to certain codons was / were amplified through gene duplication resulting in higher efficiency translation when these codon are used. This implies that the efficiency acts as a selection process that re-inforces the once-random preference.
Evidence for this model lies in the observed relationship between tRNA genes and codon usage. Work done at the Institut Pasteur shows that, “By analyzing 102 genomes we showed that as minimal generation times get shorter, the genomes contain more tRNA genes, but fewer anticodon species.”
#2: Weatheritt and Babu suggest that there are additional codes overlaying that of the DNA->RNA->Protein code. These codes are the result of DNA binding to proteins, RNA looping, or micro-RNA binding that may impose their own restrictions on sequence.
The original paper, by Stergachiset al. of the Stamatoyannopoulos laboratory at the University of Washington, used DNAse footprinting to determines the areas of DNA that were bound by proteins.
Imagine DNA as a long clothesline. In some locations socks hang from the clothesline covering up small areas of the string. DNAse is an enzyme that can chew up open DNA, but is not capable to displacing proteins to chew up the sequences they bind. That is, wherever the clothesline is empty, it is goggled up; wherever a sock hangs, those regions are protected and we can go back to see what they are.
Stergachis et al decided to look at these sequences to determine if any of these correlated with the preferred codons for several amino acids that have a number of possible codon alternatives.
What they found actually does account for some of the observed codon bias. In the figure below, taken from their paper, note the difference between the preference for the codon, CTG, if this codon appears in an area where proteins bind to the DNA. This paper does not specifically define the proteins that bind the DNA at any given location, but it is clear that this sequence is vital to two distinct functions.
Because CTG remains a preferred codon even in the absence of protein binding, it is reasonable that both models may be correct. i.e. protein binding may have tipped the balance in favor of certain codons which sets up an environment where multiple tRNA genes for this specific codon, over others coding for the same amino acid, is preferred.
Lastly, I was alerted to the following video blog addressing a different interpretation of the same data: