Threespine sticklebacks (Gasterosteus aculeatus) have shown a recurrent and relatively predictable pattern of evolution. Their ancestral home is marine, where they have long spines and a heavily armoured body; however, on numerous occasions they have moved into freshwater, and every time they do they lose their spines and armour. This is due to a difference in predation pressure. When in marine habitats, the main predators of the stickleback are birds and fish, which can only eat them if they can swallow them whole. Long spines make it more difficult for something to swallow you, and armour makes it more likely you will survive a spell in the beak of a bird before being spat out. But when in freshwater, the main predators of sticklebacks are insects, which grab hold of their prey instead of swallowing them in one. With this kind of predation, spines and armour are a twofold disadvantage: they give your predator something to grab hold of, and they cost energy to produce, inhibiting body growth and therefore increasing predation risk.
The consistency of stickleback evolution is truly remarkable. There are many instances worldwide of them moving from marine to freshwater environments, and in every one the same reduction in spines and armour is seen. However, questions still remain regarding the genetic basis of this recurrent evolution. How many genes are involved? Is it mainly changes to protein coding or regulatory genes that enable this recurrent evolution? And is there a set of “freshwater” genes in the genomes of all populations of sticklebacks, which are repeatedly selected for when they move from marine waters, or are different mutants selected for each time? These are the questions that a recent open-access Nature paper by Jones et al set out to answer.
The authors took a whole genome approach to the problem, sequencing many different marine-freshwater pairs from across the globe and comparing them to a stickleback reference genome. This avoided two limitations of previous studies. The authors could look at every stickleback gene and assess its contribution, rather than deciding a priori to focus on a single gene. And they stood more chance of detecting patterns of interaction between genes (epistasis), and whether a single trait was being affected by more than one gene (a polygenic trait).
First they generated a reference genome by sequencing one freshwater female, giving them a standard against which to compare the rest of their genomes. They then chose 10 sites which showed the two characteristic morphs (assessed by morphometric analysis), encompassing both the Pacific and Atlantic Oceans, and sequenced one freshwater (spineless, no armour) and one marine (spiny, armoured) stickleback from each site. To identify genomic regions under positive selection which would have driven the divergence between the two morphs, they used two methods. The first was a Hidden Markov Model: this splits the genome into regions, calculates a phylogenetic tree for the 21 individuals at each of these regions, and groups these trees according to similarity. This method identified 215 regions (90 after filtering) which separated marine individuals from their freshwater counterparts; the authors inferred that these were the regions likely to be under different selection pressures in either habitat. The second method was a genetic distance approach: this again splits the genome into regions, and calculates for each region a cluster separation score (CSS), to quantify the level of marine-freshwater divergence at that region. The number of divergent regions recovered by this method was 174 with a 5% false discovery rate (FDR, equivalent to a p value of 0.05), and 84 with a 2% FDR. Without being overly clear about which filtering they are accepting, the authors conclude that 242 regions (0.5% of the genome) have been identified by either method, and 147 regions (0.2% of the genome) have been identified by both.
The authors therefore regard as settled the question of whether the same variation is reused, or new variation is continually produced: 0.5% of the genome is an incredibly small proportion for recurrent evolution on this scale, and can only be explained by these relatively few genomic regions being used again and again to produce the same evolutionary pattern.
They next looked for what these regions did, by analysing 64 of the most divergent regions. 41% of these regions were non-coding and therefore regulatory, whereas only 17% were coding and showed non-synonymous differences (i.e. produced different protein products in the different environments). The other 42% were either coding or non-coding, but did not show any non-synonymous differences between the environments. The authors therefore concluded that regulatory changes account for a large majority of adaptive change.
And here I must confess some puzzlement with the execution of their next step. The authors chose to test how many regulatory differences existed between the two morphs by sequencing RNA (no problems so far) from a marine and freshwater morph “born and raised under identical laboratory conditions.” IDENTICAL LABORATORY CONDITIONS. They found significant differences in the expression of 2,817 genes out of a total of 12,594 (around 22%). One wonders how many differences would have been seen had they also compared RNA from fish in their natural habitats, the environments under which those regulatory differences have evolved.
So they had found the portions of the genome that differ between freshwater and marine sticklebacks, and had an idea what their function was. However, answering this question only raises another: if these genes are constantly used during freshwater-marine divergence, how do they avoid being recombined during sex, which would produce an individual with some “freshwater” variants, and some “marine” variants? As the authors say, “When adaptive divergence occurs in hybridizing systems, theory predicts that selection can favour molecular mechanisms that supress recombination between independent adaptive loci” (Jones et al, 2012). So these mechanisms are what they looked for next.
To do this they sequenced the genome of a marine and a freshwater morph in a hybrid zone in the River Tyne in Scotland. Here, even though the two morphs are recombining their genes during mating, only the two distinct morphs survive, with any intermediates selected against. They then looked for regions which had high CSS scores, and sharp transitions in their CSS scores at their boundaries. This would act as a signature of an inverted region, which doesn’t undergo recombination and so passes through the generations as either a “freshwater” or “marine” complex. They found three such regions, on chromosomes I, XI and XXI. They then cloned these regions into bacteria, more reliably to compare them with the reference genome, and more easily to sequence their surrounding regions. When clones were compared with the reference, only chromosomes I, XI and XXI were anomalous, further confirming their status as inversions. Inverted repeats were also found in the sequence of their surrounding regions, a signature of inversion generation. Cluster separation scores for the regions confirmed that marine and freshwater sticklebacks carry different forms of the inversions. Finally, they looked for functional significance of these regions. They found that the inversion on chromosome XXI contains “separate QTLs controlling armour plate number and body shape, traits that differ between marine and freshwater fish” (Jones et al, 2012).
So how successful have the authors been in answering their questions? The first has been an undeniable success: that such a small fraction of the genome is consistently found to produce such large phenotypic changes is convincing evidence that the same genes are used repeatedly, rather than new mutations being required every time freshwater is invaded. However, regarding the function of genes and the relative importance of coding and regulatory change, valuable initial data has been produced here, but no strong conclusions can be drawn from them. The data here allow hypotheses to be made and candidate genes to be identified; however, experimental manipulations and data from multiple generations will be needed before conclusions can be drawn with any validity.
P.S. This paper was published as an open-access article, meaning an institutional login or massive payout is not required to read it, and that figures and content can be reproduced with a citation. Let’s hope this soon becomes the norm (and that in the future, authors are not required to foot the bill to make their work open access).