Genomes and puzzles: a vision on genome assembly

Authors

  • Aureliano Bombarely Gómez United States

Keywords:

Genomes, assembly

Abstract

The development of new sequencing technologies has revolutionized genome analysis. Large sequencing projects have been replaced by more modest approaches, both in personnel and costs. It is currently possible to sequence, assemble, and analyze a medium-sized plant genome with a limited amount of resources, although we are still far from being able to assemble any genome. Large genomes, with a high content of repetitions, polyploids, or genomes with a high heterozygosity can be a difficult problem to solve.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Fiers, W., Contreras, R., Duerinck, F. & Haegeman, G. Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature (1976).

Mullis, K. B. & Faloona, F. A. Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. Meth. Enzymol. 155, 335–350 (1987). Sutton, G. G., WHITE, O., Adams, M. D. & KERLAVAGE, A. R. TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects. Genome Science and Technology 1, 9–19 (1995).

Fleischmann, R. D., Adams, M. D., White, O. & Clayton, R. A. Whole-genome random sequencing and assembly of Haemophilus. Science 269, 496–512 (1995).

Goffeau, A. et al. Life with 6000 genes. Science 274, 546–563–7 (1996).

C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012– 2018 (1998).

Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000). Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

Venter, J. C. et al. The Sequence of the Human Genome. Science Signaling 291, 1304 (2001).

Aparicio, S. Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes. Science 297, 1301–1310 (2002). 11.Chinwalla, A. T. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

Batzoglou, S. ARACHNE: A Whole-Genome Shotgun Assembler. Genome Res 12, 177–189 (2002).

Myers, E. W., Sutton, G. G., Delcher, A. L., Dew, I. M. & Fasulo, D. P. A Whole-Genome Assembly of Drosophila. Science (2000). 14.Zhang, Q. et al. The genome of Prunus mume. Nat Commun 3, 1318 (2012).

Naim, F. et al. Advanced Engineering of Lipid Metabolism in Nicotiana benthamiana Using a Draft Genome and the V2 Viral Silencing-Suppressor Protein. PLoS ONE 7, e52717 (2012).

Tomato Genome Consortium. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641 (2012). Zhang, G. et al. Genome sequence of foxtail millet (Setaria italica) provides insights into grass evolution and biofuel potential. Nat Biotech- n

ol 30, 549–554 (2012).

Bennetzen, J. L. et al. Reference genome sequence of the model plant Setaria. Nat Biotechnol 30, 555–561 (2012). 19.Garcia-Mas, J. et al. The genome of melon (Cucumis melo L.). P Natl Acad Sci Usa (2012). doi:10.1073/pnas.1205415109

Wang, Z. et al. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J (2012). doi: 10.1111/j.1365-313X.2012.05093.x

Wu, H.-J. et al. Insights into salt tolerance from the genome of Thellungiella salsuginea. P Natl Acad Sci Usa (2012). doi:10.1073/pnas. 1209954109

D'Hont, A. et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature (2012). doi:10.1038/natu- re11241

Bombarely, A. et al. A draft genome sequence of Nicotiana benthamiana to enhance molecular plant-microbe biology research. Mol. Plant Microbe Interact. (2012). doi:10.1094/MPMI-06-12-0148-TA

Wang, K. et al. The draft genome of a diploid cotton Gossypium raimondii. Nat

Genet 44, 1098–1103 (2012). 25.Xu, Q. et al. The draft genome of sweet orange (Citrus sinensis). Nat Genet 45, 59–66 (2012).

Wu, J. et al. The genome of the pear (Pyrus bretschneideri Rehd.). Genome Res (2012). doi:10.1101/gr.144311.112

Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423– 427 (2012).

Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).

Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

Valouev, A. et al. A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res 18, 1051–1063 (2008).

Rothberg, J. M. et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348–352 (2011).

Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

Bennett, M. D. & Leitch, I. J. Nuclear DNA amounts in angiosperms: targets, trends and tomorrow. Ann Bot-London (2011).

Jiao, Y. et al. Ancestral polyploidy in seed plants and angiosperms. Nature 473, 97–U113 (2011).

Bowers, J., Chapman, B., Rong, J. & Paterson, A. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422, 433–438 (2003).

Jaillon, O. et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463–467 (2007).

Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183 (2010).

Novak, P., Neumann, P. & Macas, J. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. Bmc Bioinformatics 11, 378 (2010).

Lim, K., Matyasek, R., Lichtenstein, C. & Leitch, A. Molecular cytogenetic analyses and phylogenetic studies in the Nicotiana section Tomentosae. Chromosoma 109, 245–258 (2000).

Ng, P. Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes. Nucleic Acids Res 34, e84–e84 (2006).

Fullwood, M. J., Wei, C.-L., Liu, E. T. & Ruan, Y. Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res 19, 521–532 (2009).

Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. P Natl Acad Sci Usa 108, 1513–1518 (2011).

Luo, R. et al. SOAPdenovo2: an empirically improved memory-effi cient short-read de novo assembler. Gigascience 1, 18 (2012).

Lu, F. et al. Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol. PLoS Genet 9, e1003215 (2013).

Desjardins, C. A. et al. Fine-scale mapping of the Nasonia genome to chromosomes using a high-density genotyping microarray. G3 (Bethesda) 3, 205–215 (2013).

Kapitonov, V. V. & Jurka, J. A universal classification of eukaryotic transposable elements implemented in Repbase. Nat Rev Genet 9, 411–2– author reply 414 (2008).

Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21 Suppl 1, i351–8 (2005).

Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–W439 (2006).

Korf, I. Gene finding in novel genomes. Bmc Bioinformatics 5, 59 (2004).

Lukashin, A. V. & Borodovsky, M. GeneMark.hmm: New solutions for gene finding. Nucleic Acids Res (1998).

Slater, G. & Birney, E. Automated generation of heuristics for biological sequence comparison. Bmc Bioinformatics 6, 31 (2005).

Srapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

Birney, E. Using GeneWise in the Drosophila Annotation Experiment. Genome Res 10, 547–548 (2000).

Cantarel, B. L. et al. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18, 188–196 (2007).

Kuhn, R. M., Haussler, D. & Kent, W. J. The UCSC genome browser and associated tools. Brief. Bioinformatics 14, 144–161 (2013).

Stein, L. D. Using GBrowse 2.0 to visualize and share next-generation sequence data. Brief. Bioinformatics 14, 162–171 (2013).

Robinson, J. T. et al. Integrative genomics viewer. Nat Biotechnol 29, 24–26 (2011).

McGinnis, S. & Madden, T. L. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res (2004).

Benson, D. A. et al. GenBank. Nucleic Acids Res 41, D36–42 (2013).

Magrane, M. & Consortium, U. UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford) 2011, bar009 (2011).

Mulder, N. & Apweiler, R. InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol. Biol. 396, 59–70 (2007).

Gene Ontology Consortium. Gene Ontology annotations and resources. Nucleic Acids Res 41, D530–5 (2013).

Moore, B., Fan, G. & Eilbeck, K. SOBA: sequence ontology bioinformatics analysis. Nucleic Acids Res 38, W161–4 (2010).

Li, L. OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes. Genome Res 13, 2178–2189 (2003).

Subramanian, A. et al. Application of a priori established gene sets to discover biologically important differential expression in micro- array data. P Natl Acad Sci Usa 102, 15278–15279 (2005).

Soderlund, C., Bomhoff, M. & Nelson, W. M. SyMAP v3.4: a turnkey synteny system with application to plant genomes. Nucleic Acids Res 39, e68 (2011).

Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40, e49 (2012).

Kaper, F. et al. Whole-genome haplotyping by dilution, amplification, and sequencing. P Natl Acad Sci Usa (2013). doi:10.1073/pnas. 1218696110

Wu, C. C., Ye, R., Jasinovica, S., Wagner, M. & Godiska, R. Long-span, mate-pair scaffolding and other methods for faster next-generation sequencing library creation. Nat Meth (2012).

Waldbieser, G. Production Of Long (1.5kb – 15.0kb), Accurate, DNA Sequencing Reads Using An Illumina HiSeq2000 To Support De Novo Assembly Of The Blue Catfish Genome. Plant and Animal Genome XXI Conference (2013).

Peters, B. A. et al. Accurate whole-genome sequencing and haplotyping from 10 to 20 human cells. Nature 487, 190–195 (2012).

Wu, Y. et al. High-frequency, scaled graphene transistors on diamond-like carbon. Nature 472, 74–78 (2011).

Nakaharai, S. et al. Electrostatically-reversible polarity of dual-gated graphene transistors with He ion irradiated channel: Toward recon- figurable CMOS applications. in 2012 IEEE International Electron Devices Meeting (IEDM) 4.2.1–4.2.4 (IEEE, 2012). doi:10.1109/IEDM. 2012.6478976

Published

2014-09-20

How to Cite

Bombarely Gómez, A. (2014). Genomes and puzzles: a vision on genome assembly. Encuentros En La Biología, 7(150), 151–1456. Retrieved from https://revistas.uma.es/index.php/enbio/article/view/18150