Sanjay Singh, Eurofins
Food Genomics

How is DNA Sequenced?

By Sanjay K. Singh, Douglas Marshall, Ph.D., Gregory Siragusa, Ph.D.
No Comments
Sanjay Singh, Eurofins

A guide through the genomics language barrier.

Here is a prediction. Within the next year or years, at some time in your daily work life as a food safety professional you will be called upon to either use genomic tools or to understand and relay information based on genomic tools for making important decisions about food safety and quality. Molecular biologists love to use what often seems like a foreign or secret language. Rest assured dear reader, these are mostly just vernacular and are easily understood once you get comfortable with a bit of the vocabulary. In this the fourth installment of our column we progress to give you another tool for your food genomics tool kit. We have called upon a colleague and sequencing expert, Dr. Sanjay Singh, to be a guest co-author for this topic on sequencing and guide us through the genomics language barrier.

The first report of the annotated (labeled) sequence of the human genome occurred in 2003, 50 years after the discovery of the structure of DNA. In this genome document all the genetic information required to create and sustain a human being was provided. The discovery of the structure of DNA has provided a foundation for a deeper understanding of all life forms, with DNA as a core molecule of genetic information. Of course that includes our food and our tiny friends of the microbial world. Further molecular technological advances in the fields of agriculture, food science, forensics, epidemiology, comparative genomics, medicine, diagnostics and therapeutics are providing stunning examples of the power of genomics in our daily lives.  We are only now beginning to harvest the fruits of sequencing and using that knowledge routinely in our respective professions.

In our first column we wrote, “DNA sequencing can be used to determine the names, types, and proportions of microorganisms, the component species in a food sample, and track foodborne diseases agents.” In this month’s column, we present a basic guide to how DNA sequencing chemistry works.

Image courtesy of US Human Genome Project Knowledge base
Image courtesy of US Human Genome Project Knowledge base

DNA sequencing is the process of determining the precise order of four nucleotide bases, adenine or A, cytosine or C, guanine or G, and thymine or T in a DNA molecule. By knowing the linear sequence of A, C, G, and T in a DNA molecule, the genetic information carried in that particular DNA molecule can be determined.

DNA sequencing happened from the intersections of different fields including biology, chemistry, mathematics, and physics.1,2 The critical breakthrough was provided in 1953 by James Watson, Francis Crick, Maurice Wlkins and Rosalind Franklin when they resolved the now familiar double helix structure of DNA.3 Each helical strand was a polynucleotide, which consists of repeating monomeric units called nucleotides. A nucleotide consists of a sugar (deoxyribose), a phosphate moiety, and one of the four nitrogenous bases—the aforementioned A, C, G, and T. In the double helix, the strands run opposite to each other, commonly referred as anti-parallel. Repeating units of base-pairs (bp), where A always pairs with T and C always pairs with G, are arranged within the double helix so that they are slightly offset from each other like steps in a winding staircase. On a piece of paper, the double helix is often represented by scientists as a flat ladder-like structure, where the base pairs (bp) form the rungs of the ladder while the sugar-phosphate backbone form the antiparallel rails (see Figure 1).

DNA Double Helix
Artistic representation of DNA Double Helix. Source: Eurofins

The two ends of each polynucleotide strand are called 5′ or 3′-end, a nomenclature that represents the chemical structure of the deoxyribose sugar at that terminus. The lengths of a single- or double-stranded DNA are often measured in bases (b) or bases pairs (bp), respectively. The two polynucleotide strands can be readily unzipped by heating, and on cooling, the initial double-helix structure is re-formed or re-annealed. The ability to rezip the initial ladder-like structure can be attributed to the phenomenon of base pairing, which merits repetition—the base A always pairs with T and the base G always with C. This rather innocuous phenomenon of base pairing is the basis for the mechanism by which DNA is copied when cells divide and is also the theoretical basis on which most traditional and modern DNA sequencing methodologies have been developed.

Other biological advancements also paved the way towards the development of sequencing technologies. Prominent amongst these were the discovery of enzymes that allowed a scientist to manipulate the DNA. For example, restriction enzymes that recognize and cleave DNA at specific short nucleotide sequences can be used to fragment a long duplex strand of DNA.4 The DNA polymerase enzyme, in the presence of the deoxyribose nucleotide triphosphates (dNTPs: Chemically reactive forms of the nucleotide monomers), can use a single DNA strand to fill in the complementary bases and extend a shorter rail strand (primer extension) of a partial DNA ladder.5 A critical part of the primer extension is the ‘primer’, which are short single-stranded DNA pieces (15 to 30 bases long) that are complementary to a segment of the target DNA. These primers are made using automated high-throughput synthesizer machines. Today, such primers can be rapidly manufactured and delivered on the following day. When the primer and the target DNA are combined through a process called annealing (heat and then cool), they form a structure that shows a ladder-like head and a long single-stranded tail. In 1983, Kary Mullis developed an enzyme-based process called Polymerase Chain Reaction (PCR). Using this protocol, one can pick a single copy of DNA and amplify the same sequence an enormous number of times. One can think of PCR as molecular photocopier in which a single piece of DNA is amplified up to approximately 30 billion copies!

The other critical event that changed the course of DNA sequencing efforts was the publication of the ‘dideoxy chain termination’ method by Dr. Frederick Sanger in December 1977.6 This marked the beginning of the first generation of DNA sequencing techniques. Most next-generation sequencing methods are refinements of the chain termination, or “Sanger method” of sequencing.

Frederick Sanger chemically modified each base so that when it was incorporated into a growing DNA chain, the chain was forcibly terminated. By setting up a primer extension reaction where in one of the chemically modified ‘inactive’ base in smaller quantity is mixed with four active bases, Sanger obtained a series of DNA strands, which when separated based on their size indicated the positions of that particular base in the DNA sequence. By analyzing the results from four such reactions run in parallel, each containing a different ‘inactive’ base, Sanger could piece together the complete sequence of the DNA. Subsequent modifications to the method allowed for the determination of the sequence using dye-labeled termination bases in a single reaction. Since, a sequence of less than <1000 bases can be determined from a single such reaction, the sequence of longer DNA molecules have to be pieced together from many such reads.

Using technologies available in the mid-1990’s, as many as 1 million bases of sequence could be determined per day. However, at this rate, determining the sequence of the 3 billion bp human genome required years of sequencing work. By analogy, this is equivalent to reading the Sunday issue of The New York Times, about 300,000 words, at a pace of 100 words per day. The cost of sequencing the human genome was a whopping  $70 million. The human genome project clearly brought forth a need for technologies that could deliver fast, inexpensive and accurate genome sequences.  In response, the field initially exploded with modifications to the Sanger method. The impetus for these modifications was provided by advances in enzymology, fluorescent detection dyes and capillary-array electrophoresis. Using the Sanger method of sequencing, one can read up to ~1,000 bp in a single reaction, and either 96 or 384 such reactions (in a 96 or 384 well plate) can be performed in parallel using DNA sequencers. More recently a new wave of technological sequencing advances, termed NGS or next-generation sequencing, have been commercialized. NGS is fast, automated, massively parallel and highly reproducible. NGS platforms can read more than 4 billion DNA strands and generate about a terabyte of sequence data in about six days! The whole 3 billion base pairs of the human genome can be sequenced and annotated in a mere month or less.

Continue to page 2.

About The Author

Sanjay Singh, Eurofins

About The Author

Douglas Marshall, Ph.D., Eurofins
Douglas Marshall, Ph.D.
Chief Scientific Officer

Douglas Marshall is chief scientific officer with Eurofins Microbiology Laboratories, Inc., a division of the global life sciences company Eurofins Scientific. He also is cofounder and director of the Food Safety Institute, LLC, an integrated consulting and analytical services company affiliated with Eurofins. He currently holds adjunct professor positions with Colorado State University and Florida State College. His former positions include associate dean and professor of public health at the College of Natural and Health Sciences, University of Northern Colorado; adjunct professor with the Colorado School of Public Health; professor of food science, nutrition, and health promotion at Mississippi State University; assistant professor of food science at Louisiana State University; contributing editor for the peer-reviewed scientific journal Food Microbiology, and four consecutive terms on the editorial board of the Journal of Food Protection. He is a frequent consultant to NIH, WHO, FAO, USDA and other government agencies and private companies. His research and expertise has been featured in popular press venues such as Consumer’s Reports, Fine Cooking, USA Today, Fitness, Health, Men’s Health, Chemtech, Nature Science Updates, and ASM Journal Highlights. Marshall is a frequently invited speaker and a prolific book chapter writer. With more than 250 publications and more than 150 invited presentations, his scientific research and outreach interests focus on improving the microbiological quality and safety of foods. Among these was the completion of the four-volume Handbook of Food Science, Technology, and Engineering, which he co-edited. He has been the recipient of a number of awards for his scholarly efforts including the Mississippi Chemical Corporation Award of Excellence for Outstanding Work and the International Association for Food Protection Educator Award. He is a fellow of the Institute of Food Technologists, where he has previously served as chair of two divisions and two regional sections, member of the board of directors, an Inaugural member and chair of the International Food Science Certification Commission, and a founding member of the Global Traceability Center.

About The Author

Gregory Siragusa, Eurofins
Gregory Siragusa, Ph.D.
Senior Principal Scientist

Gregory Siragusa is senior principal scientist with Eurofins Microbiology Laboratories, Inc., a division of the global life sciences company Eurofins Scientific. He has held positions with Danisco/DuPont and the USDA. He has been a reviewer for the Journal of Food Protection and Applied Environmental Microbiology.

Siragusa’s research spans fields of microbiology focusing on foods. He has been a speaker on the topics of food genomics and antibiotic alternatives. He obtained the B.S./M.S. in microbiology from Louisiana State University and a Ph.D. from the University of Arkansas. He has authored more than 100 peer-reviewed papers, chapters and abstracts. His latest activities focus on applications of genomics to food microbiology.

Leave a Reply

Your email address will not be published. Required fields are marked *

For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.