How is DNA Sequenced?

By Sanjay K. Singh, Douglas Marshall, Ph.D., Gregory Siragusa, Ph.D.

A guide through the genomics language barrier.

Our objective here is to provide a brief introduction to aspects of the technologies that are used for NGS. Execution of a sequencing project using any of the NGS technologies involves three steps:

Library preparation: Generating small pieces of DNA so that they can be read in parallel
Sequencing and imaging: Determining the sequence of the bases in immobilized DNA molecules in a massively parallel manner
Data analysis a.k.a. bioinformatics: Piecing together the bits and pieces of the sequence collected in the second step into one logical, massive and contiguous sequence.

Before going much further, we have constructed a table of some important terms for your reference (see Table 1).

Term	Brief Definition/Translation
Read Depth (or Sequencing Depth)	Number of times a sequence is determined for a single sample. A single read can have errors so multiple reads are desired for data quality.
Read Length	Length (bp) of an individual read
Coverage	A measure to determine the fraction of the total genome represented in the sequence data with a particular level of accuracy.
Library Preparation	The first step in the NGS workflow, which involves fragmenting the target DNA to a size compatible with the NGS platform and prepping the same for sequencing, i.e., by attaching adaptors.
Bp, Kb, Mb	A measure of read size or genome size: Base Pair, Kilobases (1,000 bp), Megabases (1 million bp).
Read Quality	Number of bp read errors in a sequence
FASTA and FASTQ files	Computer files containing the sequence
DNA Extraction	Wet chemistry protocol to remove high-quality DNA from a specimen
“Quality of DNA”	Indicators of quantity (ng/ml or ng’s) , purity and molecular weight of DNA extracted from a sample
“Just send me your DNA’s”	Refers to mailing or bringing DNA extracted from a sample to the sequencing lab
Table I.

During library preparation, genomic DNA is randomly broken into pieces typically <1,000 bp long, followed by ligation of adaptors (synthetic double stranded (ds) DNA fragments of known sequence) to the ends of the sheared DNA. A common theme across the NGS technologies is that millions of these adaptor-flanked DNA templates are attached to solid supports using different methods. This spatial distribution of immobilized templates allows for millions to billions of sequencing reactions to be run simultaneously. For example, in the first next-gen sequencers launched by the company 454 Life Sciences, tiny beads are used that contain several DNA strands complementary to a segment of the added-on adaptor, where the attachment of one template (piece of DNA to be sequenced) to one bead is achieved. Using PCR, multiple copies (millions) of each fragment of DNA tied to a bead are then generated on the surface of each bead.

While different NGS technologies use different sequencing chemistries to determine the sequence, all NGS protocols use smaller quantities of reagent per sequencing reaction than Sanger techniques and allow for multiple orders of increase in the amount of sequence data collected. Each of these advancements helps lower the cost of sequencing. Since sequencing reactions are performed using immobilized DNA fragments, the features of the recorded signal (typically fluorescence or light emitted during the extension of the primer) are on the scale of microns (i.e., smaller than the thickness of a human hair). Therefore, an image of reasonable surface area can provide information on millions of sequencing reactions being run in parallel. Picture a screen with many different colored dots appearing/disappearing in all parts of the screen, each representing a nucleotide base being detected and recorded into a sequence.

In case of the 454 Life Science sequencers, sequencing is conducted by a process called pyrosequencing, where a clever use of the luciferase enzyme makes every base incorporated give off a burst of light. In a single run, the 454 instrument can obtain around 400,000 reads at lengths of 200 to 400 bp. Several NGS platforms have emerged and have further reduced the cost of sequencing a genome (see Table 2).

Platform	Instruments	Read Lengths (bp)
Illumina	MiniSeq, MiSeq, NextSeq, HiSeq, HiSeqX	125–600
Ion Torrent	Proton, PGM	200–400
Pacific Biosciences	PacBio RS, PacBio RS II	4,600–14,000
Roche 454	GS FLX, GS FLX+	400–700
SOLiD	5500, 5500xI, 5500 W	100
Table 2. NGS Sequencing Platforms

In the end, all of these instruments spit out a result that is generally in the form of a file type known as a FASTA or FASTQ (refer to Table 1). These files contain the sequence of ATCG’s in a sample and are the start of the bioinformatics process to be covered in a forthcoming addition to this column.

For the food safety professional, genomics investigations require accurate sequence information for reliable interpretation. Professionals are urged to consider certified sequencing providers that offer strong customer orientation, impeccable quality, fast service and high reliability. Poor quality sequence information can lead to poor quality species assignments in public databases. Faulty assignments lead to wrong bioinformatics interpretations. Recent highly sensationalized food genomics press releases showing the presence of difficult-to-believe contaminants, such as human or rat DNA in highly processed foods, may be due to analysis of poor quality sequence information. It is also recommended that professionals consult with organizations that know something about food science and technology to make sure sequence-based conclusions are based on a foundation of real and sound data.

References

Hutchinson, C. A. III. (2007) DNA sequencing: bench to bedside and beyond. Nucleic Acids Res. 35, 6227–6237.
Lee, T. F. (1991). The Human genome project; Cracking the genetic code of life.
Watson, J. D. and Crick, F.H. (1953). A structure for deoxyribose nucleic acid. Nature. 171 (4356): 737–738.
Smith, H. O. and Wilcox, K. W.(1970) A restriction enzyme from Hemophilus influenza I. Purification and general properties. J. Mol. Biol. 51, 379–391.
Kaiser A D, Wu R (1968) Structure and function of DNA cohesive ends. Cold Spring Harb. Symp. Quant. Biol 1968;33:729-734.
Sanger, F. et al. (1977) Nucleotide sequence of bacteriophage phi X174 DNA. Nature 265, 687–696.

Additional Resources

Smith LM, Sanders JZ, Kaiser RJ, et al. (1986). “Fluorescence detection in automated DNA sequence analysis”. Nature. 321 (6071): 674–9.
Ewing, B.; Hillier, L.; Wendl, M. C.; and Green, P. (1998). Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8, 175–185.