Genome of cotton

There is a public effort to sequence the genome of cotton. It was started in 2007 by a consortium of public researchers. Their aim is to sequence the genome of cultivated, tetraploid cotton. "Tetraploid" means that its nucleus has two separate genomes, called A and D. The consortium agreed to first sequence the D-genome wild relative of cultivated cotton (G. raimondii, a Central American species) because it is small and has few repetitive elements. It has nearly one-third of the bases of tetraploid cotton, and each chromosome occurs only once. Then, the A genome of G. arboreum would be sequenced. Its genome is roughly twice that of G. raimondii. Part of the difference in size is due the amplification of retrotransposons (GORGE). After both diploid genomes are assembled, they would be used as models for sequencing the genomes of tetraploid cultivated species. Without knowing the diploid genomes, the euchromatic DNA sequences of AD genomes would co-assemble, and their repetitive elements would assemble independently into A and D sequences respectively. There would be no way to untangle the mess of AD sequences without comparing them to their diploid counterparts.
The public sector effort continues with the goal to create a high-quality, draft genome sequence from reads generated by all sources. The effort has generated Sanger reads of BACs, fosmids, and plasmids, as well as 454 reads. These later types of reads will be instrumental in assembling an initial draft of the D genome. In 2010, the companies Monsanto and Illumina completed enough Illumina sequencing to cover the D genome of G. raimondii about 50x. They announced that they would donate their raw reads to the public. This public relations effort gave them some recognition for sequencing the cotton genome. Once the D genome is assembled from all of this raw material, it will undoubtedly assist in the assembly of the AD genomes of cultivated varieties of cotton, but much work remains.