Genome Sequence Assembly
Despite the fact that the assembly of bacterial genomes has become a routine task at major sequencing centers, the assembly problem is far from being solved. Many new challenges are uncovered as scientists tackle diverse new organisms. Furthermore new sequencing technologies will change the assumptions currently made on the characteristics of the data being assembled.
Current sequencing technologies only allow us to "read" up to 1000 - 2000 bases of DNA at a time. To overcome this limitation, sequencing of entire organisms is performed through a process called shotgun-sequencing, wherein the DNA is sheared into smaller fragments whose ends are then sequenced. The reconstruction of the original DNA sequence is handled by specialized computer programs called assemblers. The output of assembly programs consists in a collection of contiguous pieces (contigs) - rarely are entire chromosomes reconstructed into a single piece. An additional computer program - the scaffolder - uses the information linking together sequencing reads from the ends of fragments to order and orient the contigs with respect to each other along a chromosome.
For more information about assembly see our Genome Assembly Primer
Current assembly research at the CBCB focuses on the following applications
- Metagenomic assembly
- Assembly assisted by optical mapping data
- Genome assembly validation
- Software engineering
Principal Investigators
Students and Postdoctoral researchers:
This is an NSF project. See more here