Sunday, December 28, 2008

Gene Conversion and Drosophila Duplications

The study of duplicated genes is an important topic in evolution and population genetics. A recent paper in PLoS Genetics by Osada and Innan (Duplication and Gene Conversion in the Drosophila melanogaster Genome) makes some interesting observations about the role of gene conversation in shaping patterns of sequence evolution in Drosophila segmental duplications. Osada and Innan discuss other topics in their paper, but I want to use this mostly as an opportunity to lay out some comments on gene conversion.

Gene conversion is one of the potential ways of resolving the Holiday structure intermediate formed between two DNA molecules during recombination or other repair process involving double strand breaks. A lot of the players involved are known, and if you have access I recommend this recent review by Chen et al.

Instead of considering gene conversion mechanistically, I am going to describe the consequences of gene conversation using a simple genetic example. Consider a diploid eukaryotic organism where the four meiotic products can be directly observed (eg, a fungus such as Saccharomyces cerevisiae). For some heterozygous locus (with the two alleles A and a) the expectation is that two of the gametes will carry the A allele and the remaining two will carry the a allele. This is depicted below, and is the situation taught in all of the intro textbooks, and is consistent with what is most often observed.

Meiosis without gene conversion:

Sometimes, though, an unexpected outcome may be observed. Instead of the expected 2:2 ratio, there may be 3 gametes with the A allele and only 1 gamete that carries the a allele.

Meiosis with gene conversion:

What happened? At some point following replication (so that each homolog is represented by two sister chromatids), there was a transfer of sequence information from one of the A chromatids to one of the chromatids that carried an a. This transfer was non-reciprocal and unidirectional. The end result is that the sequence of an a allele was converted to match that of an A.

It turns out that such a gene conversion process can occur whenever there is sequence homology between two different DNA strands (NB: although named gene conversion, this process can alter any sequences that have sufficient homology with each other. The altered regions do not have to be "genes".) Of course, this is clearly the case for corresponding loci on homologous chromosomes, but it also the case between paralogous sequences. Paralogs are simply the individual copies of a duplicated stretch of DNA that is found at multiple locations in the haploid genome. How this can work is illustrated below, where the flowchart schematically follows a single chromosome over many generations.

Gene Conversion and Duplicated Sequences:

The diagram starts be considering a single stretch of sequence on the p-arm (short arm) of the depicted chromosome. This stretch is indicated by the red box, and at this point is unique within this particular genome. At some time, an intrachromosomal duplication event occurs, and the indicated sequence is copied onto another location on the same chromosome (in this example). The red box is now duplicated. In a haploid genome that sequence is present two times.

From here things may unfold in several different ways. For this example, I'll posit that both copies of the duplication are retained, and that they each independently accrue sequence difference (mutations). This is indicated by the change in the color of the segments. Visually, initially there were two segments that were both identical, but over time one becomes orangish and the other yellowish. Now, if we knew something about how fast sequence differences (here, change in color) accumulate, we could compare the two sequences and make an estimate as to how long ago the sequences would have been identical, and thus estimate when the duplication event occurred. However, this estimate would be all messed up if a gene conversion event occurs--especially if such a possibility is not incorporated into the analysis.

The putative conversion even in the schematic has the effect of homogenizing the two copies--the "yellow" segment is converted to have the same sequence as the orange segment. Without any other information, one might observe that the two paralogs are very similar in sequence (nearly identical), and erroneously conclude that the duplication occurred recently, or that some other force (selection?) has acted to maintain the sequence similarity between the two copies.

In their paper, Osada and Innan try to get around this dating problem by using information from closely related Drosophilla species. Essentially, they count how many regions are found in two copies in the D. melanogaster genome but are only a single copy in D. simulans or D. sechellia. Knowing how long ago the melanogaster lineage separated from the others, the authors can then calculate a rate of duplication directly. Focusing on genes, they estimate that a duplication involving a single-copy gene (1 copy to two copies) occurs (and survives to be fixed) every 75 thousand years.

Based on sequence comparisons between regions duplicated in both melanogaster and another species, the authors also search for evidence of gene conversion. Using a gene-tree based approach, they find evidence for gene conversion in most (24 out of 28) of the duplicated segments they examined. This is an interesting finding, and if true, means that conversion is extremely important in terms of the evolutionary trajectory of duplicated genes.

The authors carry out additional analysis, but I'll end this post with another thought I had on this paper. For technical reasons working with duplications can be tricky. In this study, all of the duplications were identified using the genome assemblies. Although the D. melanogaster assembly is good, the others are essentially draft WGS assemblies. This means that duplications, particularly those with a high sequence identity or those present in a tandem configuration (duplicates adjacent to each other) may be missed. This could result in the incorrect conclusion that the duplication occurred after the melanogaster specialization. This should inflate the estimated rate of duplication specifically along the melanogaster lineage, but I'm not sure how it would bias the other analyzes in the paper.

Osada N, Innan H (2008) Duplication and Gene Conversion in the Drosophila melanogaster Genome. PLoS Genetics 4(12): e1000305 doi:10.1371/journal.pgen.1000305

Liu Y, West, SC (2004) TimelineHappy Hollidays: 40th anniversary of the Holliday junction
Nat Reviews Molecular Cell Biology
5, 937-944 doi:10.1038/nrm1502

Chen JM, Cooper DN, Chuzhanova N, FĂ©rec C, Patrinos GP (2007) Gene conversion: mechanisms, evolution and human disease. Nat Rev Genet. 8(10):762-75. doi:10.1038/nrg2193

No comments:

Post a Comment