The landscape of evolution

Picture an old record shop.  How do you tell a classic from a dud?  With no way of telling the value of each record, you could just as easily end up with B*Witched as with the Beatles.

Now imagine all of the records in that shop laid out in front of you in the form of a landscape.  Each record forms part of the floor, with different releases of the same album clustered tightly around each other, different albums from the same artist a little further away, artists of the same genre further away still… etc.  Starting from A Hard Day’s Night, you move through Rubber Soul, Help! and Revolver, before finally ending up at Magical Mystery Tour.  But the landscape formed by these records is not flat; some records are higher than others, making an undulating series of peaks, troughs and plains. And the higher you go, the higher the value of whatever record you are standing on.  Looking back behind you, you see a signed copy of Sgt. Pepper’s Lonely Hearts Club Band at the peak of the Beatles’ hill.

Now imagine we could do the same for genetic sequences.  Starting with a simple case, we will consider the four variants (A, C, G and T) at one position in a sequence.  Given that the rest of the sequence is identical, and the sequences are therefore very similar, this is equivalent to considering just one artist with four different albums.  On the x axis we plot the variant, and on the y we show fitness (how well the variant performs):

Much like an artist's albums, some genetic variants are better than others

Much like an artist’s albums, some genetic variants are better than others

Imagine that we are placed on the left end of the line – this is the part of space occupied by A.  One step to the right and we tread on C, another step and we’re on G, and a final step brings us to T on the right.  And as we move from one base to another, we go up and down as the fitness of each sequence increases and decreases.  In this example, G has the highest fitness (it’s the Sgt. Pepper of bases), C and T have intermediate fitness, and A has the lowest fitness.

However, this example only encompasses the fitness of the variants at one base.  If we consider two bases, the landscape changes from a simple 2D line to a more complex 3D surface.  And if we consider more bases (or all the records in the shop), we step into a world with dozens or hundreds of dimensions: the landscape pictured (taken from Hayashi et al, 2006) summarises this high dimensionality into three large peaks, with smaller peaks and troughs at the top of each one.

Fitness landscapes show the paths along which a sequence can evolve (Credit: Hayashi et al, 2006)

Fitness landscapes show the potential evolutionary paths taken by a sequence
(Credit: Hayashi et al, 2006)

Starting at one position we can journey across the landscape, treading the paths along which the sequence can evolve, with every step representing one change in the sequence.  On this journey, we have to remember only one rule: we can never go downhill.  If we go up, it means the sequence is increasing in fitness, and so will survive; however, if we go down it means the sequence is becoming less fit, and will be selected against and removed from the population.

However, this rule soon brings up a problem.  Imagine we walk up a small hill: from here, every direction leads downhill.  If we can never go down, where do we go?  Are we stuck forever on this small hill, at a relatively low fitness, unable to continue evolving?

Luckily, there is a solution: neutral variants.  These variants have different sequences but are as fit as each other, and therefore form ridges in the fitness landscape.  It is along these ridges that we can escape from low hills, into regions of the landscape that have higher peaks.  As the sequence gets longer, the chances of encountering a ridge increase, because each base adds another dimension in which fitness can vary.  As Manfred Eigen puts it in his book Steps Towards Life:

‘raising the number of dimensions increases the number of possible routes’

Using the fitness landscape, we can visualise paths from one sequence to another, and judge whether these paths will be evolutionarily viable.  By distilling the complexity of genotype and phenotype into one space, the fitness landscape gives an intuitive illustration of the process of molecular evolution.

 

 

References

Hayashi Y., Aita T., Toyota H., Husimi Y., Urabe I., et al (2006) Experimental Rugged Fitness Landscape in Protein Sequence Space. PLoS ONE 1(1): e96

Eigen, M. (1996) Steps Towards Life.  Oxford University Press

Introducing…RNAi

Nucleic acids (DNA and RNA) underpin everything in biology.  All the information about an organism is stored in nucleic acid form; and all the actions that are taken from this information rely (directly or indirectly) on nucleic acids.  So it should be no surprise that a system has evolved that allows one individual directly to manipulate the nucleic acids of another.  But the dual simplicity and intricacy of this system are truly astounding, all the more so as new discoveries are rapidly made.

The system is RNA interference (RNAi).  It is based on two simple concepts: extreme specificity in recognising its RNA targets, and generality in the mechanisms that deal with these targets.  Though it was only discovered (somewhat unwittingly) by Napoli et al in 1990, it has since been comprehensively characterised (resulting in the 2006 Nobel Prize being awarded to Andrew Mello and Craig Fire).  While its mechanisms and functions vary between different groups and even kingdoms, the basic idea is as follows:

 

The generality of the system comes from the large proteins that do the cutting: just a few different proteins can be recruited to degrade a wide array of targets.  And the specificity is conferred by the small RNAs: because they are derived from the target itself, they will necessarily be specific to that target.

It is the fourth step in the mechanism – where RNA derived from the target guides the rest of the machinery to the correct place on the correct target messenger RNA (mRNA) – that makes the system so effective and economical.  Regardless of mutations that occur in the target’s genome, the cell will always be able to deal with it, as it will use this newly mutated sequence to guide its machinery – the target can never escape degradation simply by mutational change.

These small RNAs can be broadly classified into three types.  Small interfering RNA (siRNA) targets extracellular parasites such as viruses, and hence is integral to the immune system.  Piwi-interacting RNA (piRNA) targets intracellular parasites like transposable elements (sections of the host’s own genome which can cut themselves out, or copy their sequence, and paste this back into the host’s genome), and is thought to be expressed exclusively in the germline (reproductive cells) where these elements are most active.  And microRNA (miRNA) targets the host’s own gene transcripts, allowing it to fine-tune gene expression.

These different small RNA types mean that the RNAi pathway has a dizzying array of functions in different organisms: stem cell maintenance, chromatin formation and upkeep of fertility to name just a few, as well as the antiparasite role mentioned earlier.  They have also become a standard tool for investigations into gene function and expression: by injecting the right double-stranded RNA sequence into a cell, one can knockout a specific gene while keeping everything else constant.  By seeing what the cell can no longer do, or what it starts to do, reliable inferences can then be made as to this gene’s function.

However, recent work is calling into question the strict delineation of function between the different small RNA types.  Interactions between pathways and crossover in their functions are blurring the boundaries between siRNA, miRNA and piRNA.  Most excitingly, new work is building the case for RNAi occurring in humans: future discoveries, and applications of these discoveries, will be truly fascinating.

References

Napoli, C., Lemieux, C. and Jorgensen, R. (1990) Introduction of a chimeric chalcone synthase gene into petunia results in reversible co-suppression of homologous genes in trans.  The Plant Cell 2: 279-289

Fire, A., Xu, S., Montgomery, M.K., Kostas, S.A., Driver, S.E. and Mello, C.C. (1998) Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans.  Nature 391: 806-811

Okamura, K. (2011) Diversity of animal small RNA pathways and their biological utility.  WIREs doi: 10.1002/wrna.113

DNA is a recipe, NOT a blueprint

The recent media coverage of the naked mole rat genome sequence prompted an alarming number of declarations such as “Naked mole rat’s genome “blueprint” revealed”, doubtless encouraged by an author of the sequencing study using the word “blueprint” twice during a statement that was widely repeated (e.g. here and here).

But the analogy of the genome as a blueprint is misleading for two reasons:

  1. There is no one-to-one mapping between a part of the genome and a part of the organism
  2. Changing one part of the genome does not change one particular part of the organism; instead, it can have many effects on different parts and processes

Richard Dawkins details a much better analogy in many of his books: the genome as a recipe, or set of instructions for making the organism.  This is much more accurate because:

  1. There is no way to map one crumb of a cake to one part of the recipe (in the same way as one part of the organism cannot be mapped to one part of the genome).
  2. A discrete change in the recipe (or genome) can affect the cake (or organism) as a whole i.e. substituting baking powder for yeast makes the whole cake more bready

Dawkins extends this second point in The Blind Watchmaker, to clarify an apparent paradox inherent in this recipe description: if parts of the genome do not map to discrete parts of the organism, how can we talk of a gene for a trait such as eye colour?  Considering the cake again, we can trace the difference between a fluffy and a bready cake to the replacement of baking powder with yeast.  In this sense, yeast is the instruction “for” breadiness, even though the cake would not be bready without other ingredients like flour etc.  In the same way, genes acquire functions because of the difference they make to the phenotype (trait): the name “blue-eyes gene” is still valid, because all other things being equal, an organism with the “blue-eyes gene” instead of the “brown-eyes gene” will have blue eyes instead of brown.

This analogy is also usefully extended in The Extended Phenotype to provide a logical argument against the inheritance of acquired characteristics.  Consider a man who loses an arm in a threshing accident.  If his genome were a blueprint, which maps to his body in a reversible, one-to-one manner, one would expect part of his genome to be lost as well, just as part of a blueprint is removed when the wing of a house is demolished.  By extension, he should produce only one-armed children.  Clearly, this is not what we see.

One final argument against the blueprint analogy is featured in Dawkins’ The Ancestors Tale: genome size does not increase as organism size increases.  If we considered the blueprints of a semi-detached house and Blenheim Palace, both drawn to the same scale, we would expect the blueprint of Blenheim Palace to be much bigger.  If the genome (which is at the same “scale” in all organisms) were a blueprint, we would expect a human to have a much larger genome than a naked mole rat: however, as the recent study revealed, the genomes are very similar in size (22,389 genes compared to 22,561).  And under the blueprint analogy we would definitely expect the tiny water flea Daphnia pulex to have a smaller genome: but with 30,907 genes, this has the largest genome of any animal.

Describing the genome as a blueprint is a recipe for disaster.

References

Colbourne, J.K. et al (2011) The ecoresponsive genome of Daphnia pulex.  Science 331: 555-561

Dawkins, R. (2006) The Blind Watchmaker pp. 295-297.  Penguin Books

Dawkins, R. (2005) The Ancestor’s Tale p. 190.  Phoenix, London

Dawkins, R. (2008) The Extended Phenotype pp. 174-176.  Oxford University Press

Kim et al (2011) Genome sequencing reveals insights into physiology and longevity of the naked mole rat.  Nature 000:1-5

Genetics jargon explained

In the interests of accessibility, I’m starting with a simple but brilliant metaphor for the jargon of genetics, taken from the Preface of Genome by Matt Ridley (review to come).

Genetics is written in an alphabet of four letters, called bases (the famous A, C, G and T).

These letters spell out 64 three letter words, called codons.

The words make up paragraphs, called exons.

However, some words describe adverts which interrupt the sequence of paragraphs, called introns.

One collection of paragraphs tells a story, called a gene.

The stories are collected into chapters, called chromosomes.

These chapters are gathered together in one book, called the genome.

Genome
Chromosome
Gene
Exon
Intron
Codon
Base
Book
Chapter
Story
Paragraph
Advert
Word
Letter

To refer back to this, it will be listed in the Fundamentals category.

References

Ridley, M. (2000) Genome: the autobiography of a species in 23 chapters, p.6.  4th Estate, London