Showing posts with label nucleic acid. Show all posts
Showing posts with label nucleic acid. Show all posts

Wednesday, November 13, 2013

Covering All Our Bases

Biology concepts – nucleoside, tRNA, RNA editing, nonstandard bases, DNA oxidation


Specialized pieces are needed to best build special Lincoln
Log structures, like this castle. This is much like how
specialized nucleosides are needed to carry out special
functions of RNAs. Really – a log castle? Wouldn’t the
Black Knight just burn it?
Last week, we used Lincoln Logs as a model for the different nucleic acids. The small logs mean little until you put them together in an order of which you can make – a cabin, for example. This week we can take the analogy a little further.

Some editions of Lincoln Logs have specialized pieces for building special buildings. These buildings have different purposes, like a sawmill or a bank, and the specialized pieces help them carry out their function of being that building.

Low and behold, there are special building blocks for building specialized nucleic acid structures; usually these are RNAs for which the usual building blocks just won’t do. These are the exceptions to the nucleotide rules of A, C, G, and T for DNA and A C, G, and U for RNA.

There are a few different nucleotides located in DNA molecules, but to date all these have been found to be damaged bases. Oxidized guanosine bases have been the most commonly identified mutations, because guanine is more susceptible to oxidation than the other bases. However, a recent study has identified a 6-oxothymidine in the placental DNA of a smoker.  

More than 20 oxidized DNA bases have been found at one time or another. Their importance lies in their inability to direct correct base pairing in a replicating DNA or a transcribed RNA. In particular, 8-oxoguanosine in a DNA molecule often base pairs with A instead of C, while an oxidized 8-oxoguanosine nucleotide (damaged before it is incorporated into a DNA) will often be put in where a T should rightfully have been placed.

Both of these problems would lead to mistakes in replication or transcription. Some of these mistakes could be in places that matter. If they change a codon, they might cause the wrong amino acid to be incorporated and the resulting protein might be nonfunctional. Or they could create or destroy a stop codon or a splice site. These would definitely alter the resulting protein. Mistakes like this spell disease or cancer.

The top left image shows how 8-oxoguanine is produced by
oxidative damage or radiation. The bottom left shows it
effects on DNA. There can be a miscmatch base pairing
between G and A instead of G and C when the G is damaged.
One possible result is shon on the right. Huntington’s
disease may involve the mismatching of unrepaired
8-oxoguanosines with adneosines. As a result, areas of the
brain are lost and the fluid filled sinuses are enlarged.

Oxoguanosine has been the most studied of the oxidized bases, and several diseases have been linked to this mutation. Many cancers have shown this mutation – leukemias, breast cancer, colorectal cancer, etc. But in addition, things like Parkinson’s disease, Huntington’s disease, Lou Gherig’s disease (ALS), and cystic fibrosis have been correlated with 8-oxoguanosine.

Don’t make the mistake of assuming that an 8-oxoguanosine is the cause of any or all of these diseases, most have many potential causes. The point is that this mutation may contribute to these diseases in some cases. The point then is to find out how to better prevent or repair them. However, your body is pretty good at doing this itself – if everything is behaving normally.

There are specific repair pathways dedicated to removing and replacing oxidized bases (base excision repair or BER) or for nucleotides that contain oxidized bases (nucleotide excision repair or NER) in DNA. In RNA, the major process to deal with 8-oxoguanosine is to destroy the damaged RNA. There are actually several overlapping and redundant repair pathways for 8-oxoguanosine, suggesting that this mutation is particularly damaging and must be dealt with for proper cell function.

It is when the body’s sensing and repair mechanisms don’t work that the problems begin. Therefore, science needs to find better ways to tell when the natural processes aren’t working and develop artificial ways to reverse the damage. A 2013 review is showing the way to detecting mutated guanines in bodily fluids and tissues.

Specifically, this study looked at methods of detecting 8-oxoguanosine levels in plasma, urine, and cerebrospinal fluid and what those changes might mean. The levels found represent a balance between the production and repair of the mutations, so an increase means that more mistakes are being made, or fewer are being repaired. Either way, it means that something must be done.


This is a cartoon showing RNA processing. IT IS NOT TO BE
CONFUSED WITH RNA EDITING!! In processing of eukaryotic
mRNAs, the front end (5’ terminus) is capped so it will last
longer. Then the end is augmented with a bunch of A’s, called
the poly-A tail. Finally, the introns are removed and the
exons (the parts that code for a protein) end up in a
continuous sequence.
But what about nonstandard bases that are actually supposed to be in nucleic acids? The vast majority of these are found in the RNAs and help to point out yet another exception. You think that the RNA transcribed from DNA is the same RNA that functions or is translated to protein? Not always.

RNA editing takes place all the time, where RNA bases are changed after the RNA is transcribed from DNA. In the majority of cases, the RNA editing modifies a standard nucleoside to another standard nucleoside, or add/subtract nucleotides.

Insertion/deletion edits for uracils can increase or decrease the length of the transcript. The mRNA is paired with a guide RNA (gRNA) and base-pairing takes place. For insertion, when there is a mismatch between the mRNA and the gRNA, the editosome inserts a U, so the mRNA transcript gets longer. In deletion editing, if there is an unpaired U in the mRNA, it gets cut out, so the transcript gets shorter.

This was first discovered in a parasite called Trypanosoma brucei, the causative agent of African Sleeping Sickness. There are so many positions at which these insertions/deletions take place that it has come to be known as pan-editing.

In other cases, the editing takes the form of C being replaced by a U. In some cases this results in a protein sequence different than that coded for by the DNA - on purpose!! If that isn’t an exception, I don’t know what is. Other times, the changing of a C to a U creates a stop codon.

In the human apolipoprotein B transcript, the intestinal version undergoes the C to U editing and creates a stop codon, so the apolipoprotein B is 48 kD in mass (B48). In the liver, no editing takes place, so the protein is much larger (B100).


Here are two examples of RNA editing. The top image
shows the insertion/deletion mechanism, where a guide
RNA binds to the mRNA and where there are mismatches
a U is inserted and where there are unmatched U’s, they
are removed. The bottom example is an example where
a base is changed, and this changes the codon, so a
different amino acid is inserted when translated.
There is a lot of C to U editing in plants – I mean, a lot. So much editing goes on that there is now a 2013 database and algorithm to do nothing but predict C to U and U to C edits. Yes, there are U to C edits as well, but only in plant mitochondria and plastids. As far as is known, U to C edits work to destroy stop codons.

Then there is A to I editing. Wait you say, there’s no I in nucleic acids (well, there are actually two “i”s, but you know what I mean). “I” stands for inosine, the first specialized Lincoln Log and our first nonstandard nucleoside. Adenosine (A) is deaminated to form an inosine (I).

There are many functions for inosine editing. Changes from A to I in mRNA alter the protein made since the inosines get read as G’s. Genomically coded A’s end up being read as G’s in the mRNA, and this it changes the gene product! We have many more inosine changes than other primates do. Many of these A to I edits in humans are related to brain development and are a big reason why we are smarter than chimps.

There is also A to I editing in regulatory RNAs called miRNAs (micro RNA). The miRNAs suppress (prevent) translation of some transcripts, but editing of the pre-miRNA makes it bind less well to protein complexes that process the pre- to mature miRNA. More editing mean less binding of miRNAs, which leads to decreased regulation, more transcript translation, and increased protein. This may be one way A to I editing increases human brain power.


Micro RNA is important for controlling the amount of a
transcript that will be translated to protein. The miRNA
can be edited, which will change the amount that is
processed by the protein complex, and therefore changes
the amount that is incorporated into the complex
that will degrade mRNAs.
The search is on to discover the regulation of which A’s get turned to I’s in several types of RNAs ; called the inosome (like genome). The inosome is yet another code we haven’t figured out yet. But inosine doesn’t have to be in a nucleic acid to have an effect. Sometimes it functions just by itself.

Inosine and adenosine accumulate extracellularly during hypoxia/ischaemia (lack of oxygen or blood flow) in the brain and may act as neuroprotectants. A new study extends this protective action to the spinal cord in rats in a hypoxic environment. To characterize hypoxia-evoked A and I accumulation, they examined the effect of hypoxia on the extracellular levels of adenosine and inosine in isolated spinal cords from rats. "Isolated" means the rats and their spinal cords were not necessarily in the same room at the time - so it could be a while before this helps humans.

But perhaps the most common use for I is to alter tRNA binding to amino acids and to the target codons. A to I editing can occur in the anticodon, and change which amino acid is placed in the growing peptide. This is especially true in many organisms for the amino acid isoleucine. Many tRNAs will insert an isoleucine into the protein only when the anticodon of the tRNA has been edited to contain an I in the first position (equivalent to the wobble position of the mRNA codon).


This menacing creature is a worm that lives at the bottom
of the Ocean in the Sea of Cortez. It thrives in the methane
ice on the ocean floor, making it a psychrophile. It can’t
even survive or reproduce if keep above freezing.
What is more, there are other nonstandard nucleosides that serve similar functions, usually with isoleucine or methionine amino acids. Agamantidine is present in many archaeal anticodons and codes for isoleucine. Agamantidine is also present at other points in the tRNA for isoleucine and is important for adding the isoleucine amino acid to the tRNA.

Other nonstandard (modified) nucleosides also work in tRNAs. Lysidine, dihydrouridine, and pseudouridine are some of the more common specialized Lincoln Logs – or maybe we should stick to calling them nonstandard nucleosides. They can be found in the tRNAs of organisms from each of the three domains of life (archaea, bacteria, and eukaryotes). For example, psycrophiles – organisms that grow at very low temperatures – have 70% more dihydrouridines because they help the tRNAs to flex as they need to, even at subfreezing temperatures.

Found mostly in tRNAs, but not exclusively in tRNAs, there are over 100 non-standard nucleosides. Many times they function to increase tRNA binding to transcripts via the anticodon-codon, or increase the binding of the amino acid to the tRNA. They ultimately work to increase translation efficiency. They are weird and are exceptions, but we can’t live without them.

Next week we can spend some time talking about exceptions in the realm of lipids, the last of our four biomolecules.


Paz-Yaacov N, Levanon EY, Nevo E, Kinar Y, Harmelin A, Jacob-Hirsch J, Amariglio N, Eisenberg E, & Rechavi G (2010). Adenosine-to-inosine RNA editing shapes transcriptome diversity in primates. Proceedings of the National Academy of Sciences of the United States of America, 107 (27), 12174-9 PMID: 20566853

Takahashi T, Otsuguro K, Ohta T, & Ito S (2010). Adenosine and inosine release during hypoxia in the isolated spinal cord of neonatal rats. British journal of pharmacology, 161 (8), 1806-16 PMID: 20735412

Lenz H, & Knoop V (2013). PREPACT 2.0: Predicting C-to-U and U-to-C RNA Editing in Organelle Genome Sequences with Multiple References and Curated RNA Editing Annotation. Bioinformatics and biology insights, 7, 1-19 PMID: 23362369

Poulsen HE, Nadal LL, Broedbaek K, Nielsen PE, & Weimann A (2013). Detection and interpretation of 8-oxodG and 8-oxoGua in urine, plasma and cerebrospinal fluid. Biochimica et biophysica acta PMID: 23791936

Wang P, Fisher D, Rao A, & Giese RW (2012). Nontargeted nucleotide analysis based on benzoylhistamine labeling-MALDI-TOF/TOF-MS: discovery of putative 6-oxo-thymine in DNA. Analytical chemistry, 84 (8), 3811-9 PMID: 22409256



For more information or classroom activities, see:

RNA editing –




Wednesday, October 9, 2013

The Language Of Our DNA

Biology concepts – nitrogenous base, nucleoside, nucleotide, DNA, RNA, second messengers, G protein coupled receptors, cAMP, cGTP, cyclic dinucleotides


Grammar isn’t easy. Small changes can lead to large
differences in meaning. It is like this with the terminology
in molecular biology as well. TK is a tyrosine kinase, while
TK1 is a thymidine kinase. Thymine is a nitrogenous base
of DNA while thiamine is a vitamin. Not knowing the
difference can keep you from that PhD you’ve been wanting.
It may be that English grammar is the only subject that can approach the number of exceptions one finds in biology. When do you use "who" instead of "whom;" “its” has only got an apostrophe when it isn’t possessive; I before E except after C; plural nouns add “s” but you take away the “s” to make a plural verb; their vs. there vs. they’re. It’s exasperating – English grammar should be taught only to those over 25 years of age, when one is mature enough to handle the stress.

Today we are going to talk about the building blocks of DNA and RNA – they can be as confusing as grammar. Terms and structures will look and sound similar, but their functions are very different. We’ll try to minimize the confusing details and maximize the amazing differences.

The basic building block of a nucleic acid is the nucleotide. This is a complex molecule made up of one or more phosphate groups, a ribose or dexoyribose sugar, and one of five nitrogenous bases (A, C, G, T, or U – those are the 5 – for now). Already it's a little confusing, but we can add more complexity; if you have just the base and sugar, it is called a nucleoside, not a nucleotide. Let’s use one base as an example.

Adenine (A) is the name of one nitrogenous base. If it is bound to a ribose, it is called adenosine (A), if it is bound to dexoyribose, it is called deoxyadenosine (dA). If you add a phosphate, you get the nucleotide, but the name depends on how many phosphates; one phosphate = adenosine monophosphate (AMP) or deoxyadenosine monophosphate (dAMP), 2 phosphates = adenosine or deoxyadenosine diphosphate (ADP or dADP), 3 phosphates = the triphosphate (ATP or dATP).

The other nitrogenous bases use the same system – mostly. Cytosine (C) and guanine (G) form cytidine or guanosine nucleosides or nucleotides. The exceptions are thymine (T) and uracil (U). T is formed from dUMP by adding a methyl (-CH3) group, but not from UMP. Therefore, you don’t really find thymidine, only deoxythymidine. Since they know it only comes in one form, scientists go ahead and call it thymidine - thanks a lot.


As alluded to in the text, base plus sugar equals nucleoside. Add
a phosphate, or two, or three and you have nucleotides. The sugar
can be ribose or dexoyribose, the difference being the OH at the
second carbon position. On the right are the possible bases, the purines
have two rings, the pyrimidines have one. Notice how adding a
methyl group to uracil makes thymidine or how taking away an
amine group from cytosine makes uracil. These will be important later.
In addition to the modification of U to make T, there is the removal of the 2’-OH to make deoxyribose out of ribose. This removal is made after the nucleoside is formed. Together, the modification of U to dU and the modification of dU to dT are strong evidence that RNA predates DNA and supports the RNA world hypothesis that we talked about two weeks ago.

We said above that nucleotides are the building blocks of DNA and RNA. Specifically, it's the triphosphate nucleotides (NTP or dNTP, where N means any of the bases) that are used for incorporation into the growing chains of RNA and DNA. The energy for the bond comes from releasing two of the phosphates, so the nucleotides in DNA and RNA are bonded through one phosphate linkage.

The building of nucleic acids comes from pools of NTPs and dNTs in the cell. Evidence shows that the pool of dNTPs is about 1/10 that of NTPs. This means that there are only enough dNTPs in the cell to support DNA replication for about 30 seconds. This implies that it's the rate of turning NMPs into dNMPs (then to dNTPs) that controls things like cell cycle and cell division; no replication of DNA, no division.


Ribonuclease reductase turns NDPs into dNTPs. It is well
controlled, the catalytic site is where the reaction takes
place, so the NTP goes there. The activity site requires an
ATP to activate or a dATP to inactivate the enzyme (this
keeps the dNTP levels in check). The specificity site says
which NDP can be acted on. When dATP or ATP is bound
at the specificity site, the enzyme accepts UDP and CDP
into the catalytic site; if dGTP is bound, ADP can be acted
on; if dTTP is bound in the specificity site, GDP enters the
catalytic site.
The concentration of dT is especially important, since it only comes from modifying dU. If you add some extra thymidine to cells, they will think that they have enough dNTPs. This turns off the enzyme (ribonuclease reductase) that converts NDPs to dNDPs. As a result, you won’t have enough dNTPs to make DNA and the cell will just stop.

Uses for nucleotides A, G, and C beside inclusion in DNA or RNA are more apparent (nature hates unitaskers). ATP should be near and dear to all our hearts - all our organs for that matter. ATP is the energy currency of the cell. The energy released when two phosphates are lost to incorporate a nucleotide into a growing nucleic acid is the same energy when ATP is hydrolyzed to ADP during an enzyme reaction or relaxation of a muscle.

An adenosine variant, called cyclic AMP (cAMP) is just as crucial as any other biomolecule you can name. An uncountable number (O.K., I’m sure someone knows) of cellular reactions are regulated by the levels of cAMP in the cell.

Cyclic GMP is a signaling compound similar to cAMP. Each controls a varied number of regulatory pathways and second messengers to convey information in the cell. There are also cyclic dinucleotides. Bacteria use c-di-AMP and c-di-GMP as second messengers. This has been know for some time, but a new study shows that these cyclic dinucleotides stimulate specific inflammation in a mammalian host by triggering production of the proinflammatory molecule IL-1beta. This stimulation pathway is via a completely new pathway. These are most definitely important molecules outside the nucleic acids.


cAMP and cGMP are single nucleotides in which the phosphate group
binds to the sugar at two points – it circularizes. Just because they
aren’t shown here, don’t think that cUMP or cCMP don’t exist – they
do, and they are second messengers too. In the case of the cyclic
dinucleotides, the phosphate of each nucleotide is joined to two
different sugar molecules. It is still circular, but in a way that
involves both nucleotides. The cGMP and cAMP are used in higher
organisms, the c-di-GMP and c-di-AMP are used by bacteria for
various operations, everything from gene regulation to virulence.
Cyclic di-GMP may be important for secondary signaling, but GTP and GDP also get into the game. G protein coupled receptors start many of the second messenger systems. There are many types of G protein couple receptors, but that will have to wait for another day.

CTP can act as an enzyme cofactor, especially in the production of one of the phospholipids that is most important in biological membranes (phosphatidylcholine). A similar reaction using CTP as a cofactor is the focus of a new study because the product of the reaction is important in the life cycle of the parasite that causes malaria (P. falcipaurm). The new study shows that the levels of CTP and CDP will regulate the efficiency of the enzyme using CTP, so manipulating these levels might be a target for anti-malarial drugs.

Lastly, uridine (U) is important outside of nucleic acids as well. When combined with an adenosine and four (yes, 4) phosphates, it is called uridine adenosine tetraphosphate (Up4A). This dinucleotide has recently been identified as an important controlling molecule in vascular endothelium physiology. It causes a contraction in several types of muscle cells in vessel walls, thereby regulating the tension of the walls, called vascular tone. In this way, Up4A helps manage pressure and its dysfunction is important in many vascular diseases.

As we discussed a couple of weeks ago, DNA is double stranded and the bases are paired - A with T and G with C. Chargaff first showed that the levels of dG and dC and of dA and dT were always the same in a cell.  Donahue then showed that they could base pair by hydrogen bonds.


Different amounts of G+C vs. A+T in regions of DNA lead
to different staining of the chromosome regions. GC regions
are more dense, so some stains are excluded and they show
up unstained. This difference in GC content has functional
consequences as well. High GC areas are more gene dense,
and have regulatory regions as well. A new study shows that
in chickens, high GC regions are associated with regulatory
regions of genes – the higher the GC content, the more
expression from that gene.
If you know how much dG is in a cell, then you know how much dC is there. But this doesn’t mean that G+C = A+T. The %GC content is different in different species. P. falciparum is a very low GC organism, only about 20% of the nucleotides of DNA are G or C, while other prokaryotes are up to 78% GC. See the picture caption for more on this subject.

So DNA has dA, dC, dG, and dT, while RNA uses U instead of T. Why? Such a simple question, but not many people bother to ask. There is more than one reason, but they’re all related to long-term protection of genetic information.

The cytosine base can be deaminated (removal of an amine group) to form uracil. In RNA, this mistaken identity would lead to an incorrect translation or perhaps a loss of function of a structural RNA. Fortunately, these are short-term problems because each RNA is short lived. But if U was used in DNA, then how would the repair enzymes know which U’s were correct and which were actually deaminated C’s?

Since dT is used in DNA instead of dU, any dU must be a deaminated C and should be replaced. If it were allowed to remain, then an incorrect U would be copied as an incorrect A (U is like T because it pairs with A) and this would be forever kept in the DNA - a permanent mistake. Not good.

Second, uracil forms a stable product when damaged by radiation, while radiation damage to T’s can be detected and replaced by repair enzymes. So again, using dT in DNA leads to a more stable, more protected, long-term storage molecule.

A third reason for dT in DNA is related to base pairing. U pairs best with A, but it can base pair with G, T, or C. This increases the chances of mismatched pairs in the DNA double strand - not good for keeping information pristine in the long run. Protection against damage is also illustrated by the fact that dT is basically methylated U.


This is a cartoon representation of a tRNA that is charged
with a phenylalanine amino acid.  The different loops are
associated with efficiency of action with the template,
binding to the ribosome, and binding of the amino acids.
The T loop actually contains a T base (at grey arrow), it’s
an RNA, but it includes a T – that’s the very definition
of an exception.
Methyl groups have a tendency to protect the bases from enzymes that break down DNA (nucleases). We will talk about this more next week. So again, using dT in DNA is more protective than using uracil.

Whew, good thing we use U for RNA and T for DNA, right. Well….. not always. tRNAs are a huge exception, which we will talk about much more in future posts. Thymidine is found in the T arm or T loop of tRNA; here it is important for binding the tRNA to the ribosome during translation. A DNA nucleotide in an RNA??? What gives?

Remember, T only occurs naturally as dT. T ends up in tRNA by virtue of a modification that methylates a U. Once modified, you can’t tell it from any other T – except that now it is bound to a ribose, not deoxyribose. English grammar seems a lot easier by comparison, doesn’t it.



Abdul-Sater AA, Tattoli I, Jin L, Grajkowski A, Levi A, Koller BH, Allen IC, Beaucage SL, Fitzgerald KA, Ting JP, Cambier JC, Girardin SE, Schindler C. (2013). Cyclic-di-GMP and cyclic-di-AMP activate the NLRP3 inflammasome. EMBO Rep.

Nagy GN, Marton L, Krámos B, Oláh J, Révész Á, Vékey K, Delsuc F, Hunyadi-Gulyás É, Medzihradszky KF, Lavigne M, Vial H, Cerdan R, Vértessy BG. (2013). Evolutionary and mechanistic insights into substrate and product accommodation of CTP:phosphocholine cytidylyltransferase from Plasmodium falciparum FEBS J. DOI: 10.1111/febs.12282

Rao YS, Chai XW, Wang ZF, Nie QH, Zhang XQ. (2013). Impact of GC content on gene expression pattern in chicken Genet Sel Evol. DOI: 10.1186/1297-9686-45-9



For more information and classroom activities, see:

Nucleotide/nucleoside –

Cyclic nucleotides/dinucleotides –

Why thymidine is used in DNA –

Wednesday, September 25, 2013

RNA Takes First Place

Biology concepts – nucleic acids, DNA, RNA, central dogma of molecular biology, ribozyme, RNA world hypothesis


The Library of Congress in Washington DC was designed as a 
showplace as well as a repository. The main reading room looks as
much like a museum or a cathedral as it does a library. If I could
figure out how to get away with it, I would live in the LOC.
Did you know that there are more than 155.3 million informational items (books and such) in the Library of Congress? Established in 1800 with 3000 volumes, the library was originally housed in the Capitol Building. Unfortunately, all the books were lost when the British fired Washington in 1814. No worries, the LOC then purchased Thomas Jefferson’s personal library of over 6500 books and set up shop in new building, although not the 1892 designed library that exists today (left).

In a way, you can think of the molecular workings of the cell like the Library of Congress. You need information storage – these are the books. In each book (chromosome or parts of a chromosome) contain the instructions (genes) needed to make products (proteins) the cell may need.

Each time you want to make a certain molecule, you must consult the book (chromosome) that has the correct instruction page (DNA gene). But you may be making many copies of your product in a short period, so one book might not be enough.

You could keep many copies of each book, maybe thousands, but this would take up too much room. The LOC already covers 2.1 million sq. feet (and that’s just one main building). What if you needed 1500 copies of One Good Turn (and interesting book about the history of the screw and screwdriver) because at some time or another, 1500 people wanted to learn how to build a square screwdriver?

To avoid this need for extra space, you make copies of pages (mRNA) from the books (chromosomes) that can be taken out of the library (nucleus) and used for making the products. Each time you want a product, a translator (tRNA and ribosome) must be used. This converts the copied instructions (mRNA) into a usable product (protein).

When one or several translations have been made, the copied instructions start to tear and get worn, and finally break down. Good thing we still have the original copy of the book stored in the nucleus… I mean library. We can go back and make more copies later if we need them. Humans are amateurs, we only have about 25,000 sets of instructions stored in 46 books, nowhere near the 155.3 million of the LOC.


The central dogma of molecular biology says that DNA is replicated to
DNA, so daughter cells get a full set of instructions. DNA is also
transcribed to mRNA, which is a copied message of the instructions to
build one protein. Finally, the mRNA acts as a code that is translated
into an amino acid polymer – a protein. HIV and other retroviruses
laugh at the central dogma, going the opposite direction, RNA to
DNA. Retrotransposons laugh at HIV, as they can do all that and more.
Cells take this library/nucleic acid analogy further. Sure, they have DNA, mRNA, and tRNA so that they can carryout the central dogma of molecular biology --- DNA goes to mRNA goes to protein (via tRNA and rRNA), but they have so much more. Just as there are many kinds of information storage at the LOC--- books, images, recordings, manuscripts, pamphlets, there are different kinds of nucleic acids as well.

Ever here of small nuclear RNAs, or micro RNAs, or plasmid DNAs for that matter? We have talked about plasmids as extrachromosomal pieces of DNA that can code for genes, especially antibiotic resistance genes in prokaryotes.

But the list of RNAs is far more impressive. There are regulatory RNAs that control gene expression (whether or not a protein is made from a gene), RNAs that control modification of other RNAs or work in DNA replication. There are even RNAs that are parasitic, like some viral genomes (RNA viruses) and retrotransposons.

Of these, retrotransposons may be the most interesting. A transposon is a piece of DNA that can jump around from place to place in the chromosomes of a cell. Barbara McClintock won a Nobel Prize for identifying transposable elements were responsible for the different colors of corn kernels in maize.


Ancient viral RNA got inserted into plant and animal genomes. The
retrotransposon can be transcribed to mRNA, and then could be
reverse transcribed back into DNA or translated into protein. The
DNA can then insert itself anywhere in the genome. Since several
mRNA transcripts can be made from one transcribed retrotransposon,
and since several pieces of DNA can be reverse transcribed from just
one mRNA, we have the potential for millions of retrotransposons in
the genome – and that’s exactly what we have found. The bottom
cartoon shows HIV. Since reverse transcription makes more mistakes
than DNA replication, many more mutants can be produced. This is
one reason HIV is so hard to treat – it’s always changing.
Retrotransposons use the library analogy to fill the shelves with hundreds of copies of themselves. If plant nuclei were like libraries, up to 80% of their book pages would be retrotransposons!

In and of themselves, retrotransposons represent an exception in nucleic acids. They are mRNA sequences that can turn back into DNA. Transcription is the process of using DNA to produce an mRNA, so going the opposite direction is called reverse transcription. This is also what retroviruses like HIV do.

In the case of retrotransposons, the chromosome held copies will be transcribed to an mRNA, and some of those copies might be translated into protein. Other copies will be reverse transcribed back to DNA by an enzyme called reverse transcriptase and will insert themselves somewhere in the genome (see picture).

In this way, retrotransposons can make more copies of themselves and end up all over the chromosomes of the organism. Mutation occurs at a higher rate in reverse transcription than in DNA replication because reverse transcriptase makes more mistakes than replication enzymes. This is why HIV is so hard to treat; it mutates so often that drug design can’t keep up with the changes in the viral proteins.

So how can the same mRNA sometimes be translated, and other times end up in a new place on the DNA? A 2013 study has investigated how one type of retrotransposon manages these different outcomes. The BARE retrotransposon of plants has just one coding sequence for a protein, but the study results show that it actually makes three distinct mRNAs from this one piece of DNA.


Sam Kean is the author of The Violinist’s Thumb, a very readable
book on molecular biology. He goes through how fruit flies were
recruited to disprove DNA heredity and ended up as the strongest
evidence for it; how DNA is linked very strongly to linguistics and
math; and how Stalin tried to breed a race of half human - half
chimps. This is in addition to showing how most DNA on Earth is
descended from viruses.
One transcript (mRNA) is modified so it can be translated but cannot be reverse transcribed. The second transcript is packaged in small bundles to be reverse transcribed later back to DNA. The third transcript type is smaller and actually houses the bundles of mRNAs to be reverse transcribed. So this retrotransposon balances itself between making protein and inserting itself into new places in the genome.

If plants have so much nucleic acid in the form of retrotransposons, could these be the remnants of ancient viral infections? You betcha, and it doesn’t stop with plants. In his fascinating book, The Violinist’s Thumb, Sam Kean lays out a compelling argument that most human DNA is actually just viral nucleic acid remnants, much of it being mutated versions of old RNAs.

Old RNA is probably the best way to describe all nucleic acids, because the generally accepted view of the evolution of life on Earth is that everything started with RNA. This called the RNA world hypothesis and professes that the job that DNA does now was first done by RNA.

The hypothesis also says that what those that protein enzymes now do - cutting things up, putting things together, and modifying existing structures - was originally done by RNAs as well, called catalytic RNAs.

We have evidence for this hypothesis, specifically, we know of many RNAs that have enzymatic activity. Called ribozymes (a cross between ribo for RNA, and zyme for enzyme), some RNAs carry out enzymatic roles in our cells and the cells of every eukaryote and prokaryote ever analyzed for their presence.


Ribozymes, a form of catalytic RNA, are present in most cells. They come
in two flavors based on what someone thought their secondary structure
looked like – the hammerhead or the hairpin. Scientists aren’t the most
imaginative when it comes to naming things. They both sit down on an
RNA where they recognize their specific sequence, and make a cut in the
strand. In the cartoon, N stands for any nucleotide, and X stands for
unknown. On the right side is a diagram showing how one ribozyme can
act again and again to cleave RNAs.
So now we are aware of two exceptions when it comes to the central dogma of molecular biology and RNA – 1) RNA can be converted back into DNA and 2) RNA can act like an protein enzyme.

One essential ribozyme function is the synthesis of protein. The ribosome (a riboprotein because it is made up of many RNAs and proteins) translates the codons of mRNA into a sequence of amino acids. It uses the RNA to link the individual amino acids together via peptide bonds. I’d say that’s essential.

Other ribozymes work on themselves. Many mRNAs, when first copied from DNA have sequence within them that is not used in the final product. These are called intervening sequences (or introns), and are cut out (spliced) as part of the transcript processing. Group I and II introns are self-splicing. They fold over on themselves and cause their own excision from the RNA of which they are part!

Group I introns can be found in the mRNAs, rRNAs, and tRNAs of most prokaryotes and lower eukaryotes, but the only place we have found them so far in higher eukaroytes are the introns of plants and the introns of mitochondrial and chloroplasts genomes.  Yet more evidence for the plastid endosymbiosis hypothesis.

If the RNA world hypothesis is to be strengthened, we must find a catalytic RNA that can replicate long strings of RNA “genes.” If RNA was both the storage material and the enzymatic material, there must have been an RNA-dependent, RNA polymerase that was itself a piece of RNA. An RNA replicase has not been found, probably because life moved on to using DNA as the long-term repository of genetic information, But we should be able to make an RNA replicase as a proof of concept.


The RNA world hypothesis is an idea of how early life on Earth transmitted
information and carried out functions. RNA did everything, stored info.,
replicated itself, and carried out enzymatic activity. A – E represent a
possible sequence, although no times can be assigned yet. According to this
theory – the last thing that developed was enzymatic proteins – but new
evidence suggests that proteins were important for the development of
tRNAs so they must have been around earlier. Step B is an area of interest,
as scientists are trying to make an RNA that could replicate any RNA, even itself.
A few ribozymes can polymerize a few nucleotides into short RNAs. The problem is that we need to show that there is an RNA that could replicate long strings of RNA that could then go on to have biological function. Until 2011, the best we’d produced was a ribozyme (called R18) that could polymerize just 14 ribonucleotides.  

Then a study was published showing that a modification of R18 could synthesize much longer strings and could replicate many different RNA templates. In this publication, the authors could synthesize ribonucleic acids of 95 bases, almost as long as the R18 replicase itself. Another study has shown that some catalytic RNAs can self-replicate at an exponential rate, making thousands of copies of themselves while still having catalytic function.

It seems that the RNA hypothesis is getting stronger, but there remain some hurdles.
A July, 2013 study shows that primitive protein enzymes (called urenzymes, where ur = primitive) activate tRNAs much faster than do ribozymes. These primitive proteins date to before the last common ancestor, so they have been around nearly as long as life itself. tRNA urenzymes suggest a tRNA-enzyme co-evolution, providing evidence that catalytic proteins and the conventional central dogma were important in early life – a result that does not support the RNA world hypothesis. I’m glad – the hunt goes on.

In the next weeks, let’s take a look at nucleic acid structures and their building blocks. Think DNA is double stranded? – not always. Think A, C, G, T, and U are the only nucleotides life uses? – not even close.



Chang W, Jääskeläinen M, Li SP, & Schulman AH (2013). BARE Retrotransposons Are Translated and Replicated via Distinct RNA Pools. PloS one, 8 (8) PMID: 23940808

Li L, Francklyn CS, & Carter CW (2013). Aminoacylating Urzymes Challenge the RNA World Hypothesis. The Journal of biological chemistry PMID: 23867455

Ferretti AC, & Joyce GF (2013). Kinetic properties of an RNA enzyme that undergoes self-sustained exponential amplification. Biochemistry, 52 (7), 1227-35 PMID: 23384307


For more information or classroom activities, see:

Nucleic acids –
Central dogma of molecular biology –

Types of RNA –

Retrotransposons –

RNA world hypothesis –

Catalytic RNA (ribozymes) –