Tag Archives: GenomeTrakr

Dr. Douglass Marshall, Chief Scientific Officer – Eurofins Microbiology Laboratories
Food Genomics

Part II: Logistics of GenomeTrakr

By Douglas Marshall, Ph.D., Gregory Siragusa, Ph.D.
No Comments
Dr. Douglass Marshall, Chief Scientific Officer – Eurofins Microbiology Laboratories

Last month in Food Genomics we asked FDA scientists Drs. Marc Allard and Eric Brown to help the readers of Food Safety Tech understand the process used by GenomeTrakr. In part two we cover some logistical and more general questions.

Greg Siragusa/Douglas Marshall: Why should a food producer or processor submit its own pathogen isolates to GenomeTrakr? Are there any legal liabilities incurred by doing so?

Eric Brown/Marc Allard: The database is available publicly for any outside laboratory to be able to rapidly compare their new WGS data to all of the data in the database. The data is all publicly available so food industry members should carefully consider the strengths and weaknesses of sharing data. The main reason for sharing data is that if any matches arise then this would be immediately known for an investigation and corrective action. With knowledge, companies can better understand their risk and exposure to occasional contamination events.

Siragusa/Marshall: Are there private third-party providers who will perform the same method of sequence analysis for private companies that GenomeTrakr uses in the FDA?

Brown/Allard: Yes, as all of the FDA methods of data collection and analysis are fully transparent and publicly available, any expert third-party provider could easily set up and reproduce the GenomeTrakr methods. Third-party support may be an excellent mechanism for food industry partners that wish to examine the pathogens they have found connected to their products but do not wish to maintain an active WGS laboratory. An internet and reference search will uncover these private third-party providers, as this is a growing market with a diversity of services provided. The FDA works closely with the Institute for Food Safety and Health (IFSH) to share information that may be valuable to their industry partners.

Siragusa/Marshall: Will the FDA perform analysis of isolates for private parties and the sequence not made publicly available?

Brown/Allard: No. While we will sequence relevant strains from many different sources, as a matter of protocol we will submit all of these data to the GenomeTrakr database. That is, currently, the FDA sequences and uploads all available genomic strain data. All data are made publicly available through the GenomeTrakr and NCBI pathogen detection website. The metadata describing each isolate only includes species, date, state location and a general food description which could include the type of food (e.g., an egg) and/or the type of sample (e.g., environmental swab, surface water, sediment, etc.) as well as production date, pH, fat content and water activity. No trade or industry brand names are made publicly available, and the location is ambiguous down to the state level to allow for anonymity of specific farm names or processing centers. An example of metadata in the GenomeTrakr database might include Salmonella, from Washington State in spinach from 2015.

Siragusa/Marshall: Is the CDC tied into GenomeTrakr and if so, how?

Brown/Allard: CDC labels their clinical WGS data as PulseNet with the data uploaded to the NCBI Pathogen Detection website. USDA FSIS also uploads the isolates that they have collected and sequenced from foods that they regulate. All of this WGS data is housed in a centralized repository at NCBI Pathogen Detection website where NCBI conducts rapid analysis for QA/QC. The NCBI posts a daily tree for all species that recently have been uploaded. This way all of the data collected by these federal laboratories and their state and international partners are made publicly available for direct comparison. Numerous other international and academic laboratories also provide data to the NCBI centralized database. When isolates cluster together and appear to be closely related, the FDA works with CDC and USDA FSIS through the normal channels. The great benefit of combining food, environmental and clinical isolate genomes in a common database cannot be overstated.

Siragusa/Marshall: In the event of an outbreak, is it possible to obtain WGS’s from using a shotgun metagenome (a microbial and organismic profile obtain by sequencing all of the DNA in a sample, not just bacterial analysis of an enrichment thereby precluding isolation? (Refer to glossary; see Table 1)

Brown/Allard: Yes, preliminary research has documented the potential to obtain WGS data from cultural enrichments, saving the time it takes for full pure culture isolation, which potentially could provide time savings of two to five days depending on the pathogen. Having well characterized draft genomes such as those in the GenomeTrakr database will support rapid characterization from metagenomes after cultural enrichment. A future goal for the FDA is to transform and expand GenomeTrakr into metaGenomeTrakr to support either pure culture or enriched shotgun metagenomic samples.

Siragusa/Marshall: Is there any way that associated metadata tied to a strain (and hence its sequence) can be unmasked through legal action?

Brown/Allard: FDA protects confidential metadata collected during inspection just as it has always done with PFGE data. WGS data is protected at the same level as other types of subtyping information.

Siragusa/Marshall: Is the GenomeTrakr database associated with the GMI (Global Microbial Identifier)?

Brown/Allard: The GMI is a consortium of like-minded public health scientists who wish to collaborate to create a harmonized global system of DNA genome databases that is publicly available to promote a one-health approach. The GenomeTrakr is one of the databases that make up this larger effort that includes some data from members of the GMI.

Siragusa/Marshall: This column is meant to keep food safety professionals abreast of the latest knowledge, technology and uses of genomics for food safety and quality. Tell us your vision of how or which changes in technology (sequencing chemistry, bioinformatics, etc.) will be coming down the pike and how it might impact GenomeTrakr?

Brown/Allard: New technology has been constantly improving in WGS and in sequencing for the last 20 years, and there is no sign of this slowing down. Improvements continue to accrue in chemistry, equipment and software analysis. Likely future improvements will include more turnkey solutions for WGS from sample to report. This includes both DNA extraction and library preparation for sequencing, as well as data analysis pipelines (the system of analyzing the actual sequence data) that provide rapid, accurate and simple language results. Smaller mobile WGS devices are starting to show feasibility that would bring the lab to the samples and decrease the time to an answer (See: https://nanoporetech.com/products/minion) Metagenomics approaches appear to be maturing so that technology improvements are moving this out of a research phase and into direct applications. Currently MISeq (a commonly used workhorse nucleic acid sequencer made by the Illumina Co.) outputs are on the order of 300 base pair read lengths of nucleotides (i.e. A’s, T’s. C’s G’s), long read sequencing technologies, upwards of 1,500 base pairs may make analysis much easier so that more assembled and completed finished genomes are available in the databases. Cloud-based solutions of data analysis pipelines may provide simple solutions, giving wider access to rapid, validated data analysis and results. FDA researchers are working on all of these aspects of improvements in WGS technology as well as expanding the network to more global partners.

Siragusa/Marshall: Sequences deposited into GenBank (as part of GenomeTrakr) are accessible to anyone anywhere. Does this essentially usher in a whole new chapter in food microbiology especially at the pre-harvest level?

Brown/Allard: Yes, having well characterized reference genomes provided by GenomeTrakr partners will support microbial ecology and metagenomics studies. Metagenomics or microbiomes describing which species are present and what they may be doing in the ecology is providing new knowledge in all aspects of the farm to fork continuum. As the costs for these services decrease, we are seeing an increase in use to answer questions that have been impossible or extremely difficult in the past.

Siragusa/Marshall: GenomeTrakr is not a project per se; rather it is a program. How is it funded and will it continue on stable fiscal footing for the foreseeable future?

Brown/Allard: GenomeTrakr started as a research project in the Office of Regulatory Science in CFSAN, but much of this data collection is no longer research. Today, and for some time in the future, WGS at the FDA is collected as fully validated regulatory data to support outbreak and compliance investigations. As such, the FDA is in transition of moving WGS into a phase for more stable regulatory support. Research and development for future applications and technology exploration will always be a part of the FDA portfolio, although typically at lower funding levels than the regulatory offices. Public health funding is generally protected as everyone wants safe food.

Siragusa/Marshall: Are there any restrictions of isolate source? For instance, can isolates from poultry flocks or even wild birds be deposited?

Brown/Allard: The GenomeTrakr and NCBI pathogen detection databases are open to the public and thus there are no restrictions as long as the minimal metadata and QA and QC metrics are met. Current GenomeTrakr WGS foodborne pathogen data includes samples from both poultry and wild birds, as well as turtles, snakes and frogs. Members interested in what is in the database can go to the NCBI Pathogen Detection website and filter on simple words like avian, bird, gull, chicken, wheat, avocado, etc. An example is as follows for a snake.

Siragusa/Marshall: If a company deposits an isolate, will it have access to the GenomeTrakr derived sequence exclusively or at least initially for some period before that information becomes public?

Brown/Allard: No, currently the FDA does not hold WGS data. All data collected by the FDA is uploaded and released publicly at the GenomeTrakr bioprojects and at NCBI pathogen detection website with no delays. If companies wish to hold data then they need to look to third-party solutions for their needs. The reason that GenomeTrakr has been so successful is due to the real-time nature of the released information and that it is globally available.

Read on to page two below.

Gregory Siragusa, Eurofins
Food Genomics

GenomeTrakr: What Do You Know and What Should You Know?

By Gregory Siragusa, Douglas Marshall, Ph.D.
No Comments
Gregory Siragusa, Eurofins

This month we are happy to welcome our guest co-authors and interviewees Eric Brown, Ph.D. and Marc Allard, Ph.D. of CFSAN as we explore the FDA’s GenomeTrakr program in a two-part Food Genomics column. Many of our readers have heard of GenomeTrakr, but are likely to have several questions regarding its core purpose and how it will impact food producers and processors in the United States and globally. In Part I we explore some technical aspects of the topic followed by Part II dealing with practical questions.

Part I: The basics of GenomeTrakr

Greg Siragusa/Doug Marshall: Thank you Dr. Allard and Dr. Brown for joining us in our monthly series, Food Genomics, to inform our readers about GenomeTrakr. Will you begin by telling us about yourselves and your team?

Eric Brown/Marc Allard: Hello, I am Eric, the director of the Division of Microbiology at the U.S. Food and Drug Administration at the Center for Food Safety and Applied Nutrition. Our team is made up of two branches, one that specializes in developing and validating methods for getting foodborne pathogens out of many different food matrices and the other branch conducts numerous tests to subtype and characterized foodborne pathogens. The GenomeTrakr program is in the subtyping branch as Whole Genome Sequencing (WGS) is the ultimate genomic subtyping tool for characterizing a foodborne pathogen at the DNA level.

Hello, my name is Marc, I am a senior biomedical research services officer and a senior advisor in Eric’s division. We are part of the group that conceived, evaluated and deployed the GenomeTrakr database and network.

Siragusa/Marshall: Drs. Allard and Brown, imagine yourself with a group of food safety professionals ranging from vice president for food safety to director, manager and technologists. Would you please give us the ‘elevator speech’ on GenomeTrakr?

Brown/Allard: GenomeTrakr is the first of its kind distributed network for rapidly characterizing bacterial foodborne pathogens using whole genome sequences (WGS). This genomic data can help FDA with many applications, including trace-back to determine the root cause of an outbreak as well providing one work-flow for rapidly characterizing all of the pathogens for which the agency has responsibility. These same methods are also very helpful for antimicrobial resistance monitoring and characterization.

Siragusa/Marshall: From the FDA website, GenomeTrakr is described as “a distributed network of labs to utilize whole genome sequencing for pathogen identification.” We of course have very time-proven methods of microbial identification and subtyping, so why do we need GenomeTrakr for identification and subtyping of microorganisms?

Brown/Allard: If all you want to know is species identification then you are correct, there are existing methods to do this. For some applications you need full characterization through subtyping (i.e., Below the level of species to the actual strain) with WGS. WGS of pathogens provides all of the genetic information about an organism as well as any mobile elements such as phages and plasmids that may be associated with these foodborne pathogens. The GenomeTrakr network and database compiles a large amount of new genetic or DNA sequence data to more fully characterize foodborne pathogens.

GenomeTrakr and WGS are a means to track bacteria based on knowing the sequence of all DNA that comprises that specific bacterium’s genome. It can be called the “ultimate identifier” in that it will show relationships at a very deep level of accuracy.

Siragusa/Marshall: Is it an accurate statement that GenomeTrakr can be considered the new iteration of PulseNet and Pulse field gel electrophoresis (PFGE)? Will PulseNet and PFGE disappear, or will PulseNet and GenomeTrkr merge into a single entity?

Brown/Allard: PulseNet is a network of public health labs run by the CDC, with USDA and FDA as active participants. The network is alive and well and will continue subtyping pathogens for public health. The current and historical subtyping tool used by PulseNet for more than 20 years is PFGE. It is expected that CDC, USDA and FDA’s PFGE data collection will be replaced by WGS data and methods. That transformation has already begun. GenomeTrakr is a network of public health labs run by the FDA to support FDA public health and regulatory activities using WGS methods. Starting in 2012, this network is relatively new and is focused currently on using WGS for trace back to support outbreak investigations and FDA regulatory actions. CDC PulseNet has used WGS data on Listeria and collects draft genomes (i.e., unfinished versions of a final genome are used for quicker assembly) of other foodborne pathogens as well, and USDA’s FSIS has used WGS for the pathogens found on the foods that they regulate. All of the data from GenomeTrakr and Pulsenet are shared at the NCBI Pathogen Detection website (see Figure 1).

Sequences, GenomeTrakr
Figure 1

Siragusa/Marshall: Does an organism have to be classified to the species level before submitting to GenomeTrakr?

Brown/Allard: Yes, species-level identification is part of the minimal metadata (all of the descriptors related to a sample such as geographic origin, lot number, sources, ingredients etc.) required to deposit data in the GenomeTrakr database. This allows initial QA/QC metrics to determine if the new genome is labeled properly.

Siragusa/Marshall: After an isolate is identified to the species level, would you describe to the reader what the basic process is going from an isolated and speciated bacterial colony on an agar plate to a usable whole genome sequence deposited in the GenomeTrakr database?

Brown/Allard: The FDA has a branch of scientists who specialize in ways to isolate foodborne pathogens from food. The detailed methods used ultimately end up in the Bacteriological Analytical Manual (BAM) of approved and validated methods. Once a pathogen is in pure culture then DNA is extracted from the bacterial cells. The DNA is then put into a DNA sequencing library, which modifies the DNA to properly attach and run sequencing reactions depending on the specific sequencing vendor used. The sequence data is downloaded from the sequencing equipment and then uploaded to the National Center for Biotechnology Information (NCBI) Pathogen Detection website. The database is publicly open to allow anyone with foodborne pathogens to upload their data and compare their sequences to what is available in the database.

Siragusa/Marshall: Suppose a specific sequence type of a foodborne bacterial pathogen is found and identified from a processing plant but that the plant has never had a positive assay result for that pathogen in any of its history of product production and ultimate consumption. If an outbreak occurred somewhere in the world and that same specific sequence type were identified as the causative agent, would a company be in anyway liable? Could one even make an association between the two isolates with the same sequence type isolated at great distances from open another?

Brown/Allard: The genetic evidence from WGS supports the hypothesis that the two isolates shared a recent common ancestor. If, for example, the isolate from the processing plant and the outbreak sample where genetically identical across the entire genome, the prediction is that the two samples are connected in some way that is currently not understood. The genetic matches guide the FDA and help point investigations to study the possible connections. This might include additional inspection of the processing plant as well as linking this to the typical epidemiological exposure data. Sometimes due to the indirect nature of how pathogens circulate through the farm to fork continuum and the complex methods of trade, no connection is made. More commonly, these investigative leads from genetic matches help the FDA establish direct links between the two bacterial isolates through a shared ingredient, shared processing, distribution or packaging process. The genetic information and cluster helps the FDA discover new ways that the pathogens are moving from farm to fork. We are unaware of any example where identical genomes somehow independently arose and were unrelated. This is counter to molecular evolutionary theory anyway. Genetic identity equals genetic relatedness and the closer two isolates are genetically to each other, the more recent that they shared a common ancestor. With regard to liability, this is a topic beyond the scope of our group, but genomic data does not by itself prove a direct linkage and that is why additional investigations must follow any close matches.

Siragusa/Marshall: We know that SNPs (Single Nucleotide Polymorphisms or single base pair differences in the same location in a genome) are commonly used to distinguish clonality of bacteria with highly similar genomes. Are there criteria used by GenomeTrakr bioinformaticists that are set to help define what is similar, different or the same?

Brown/Allard: As the database grows with more examples of diverse serotypes or kinds of foodborne pathogens, the FDA WGS group is observing common patterns that can be used as guidance to define what is same or different. For example, closely related for Salmonella and E. coli are usually in the five or fewer SNPs, and closely related for Listeria is 20 or fewer SNPs using the current FDA validated bioinformatics pipeline. These values are not set in stone but should be considered more like guidance for what FDA and GenomeTrakr have observed already from earlier case studies that have already been collected and examined. Often, a greater number (e.g., 21-50) of SNP differences have been observed between strains isolated in some outbreaks. Any close match might support or direct an outbreak investigation if there is evidence that suggests that a particular outbreak looks most closely like an early case from a specific geographic location. WGS data helps investigators focus their efforts toward and international verses domestic exposure or possible country of origin. Even more divergent WGS linkages, when SNPs are greater than 50-100, often connect to different foods or different geographic locations that would lead investigators away from the source of an outbreak as the data provides both inclusivity as well as exclusivity.

When two strains have more than 50–100 SNPs, different food or geographic sources of those strains can be incorrectly linked resulting in investigators pursuing an incorrect source.

Siragusa/Marshall: Can SNPs be identified from different agar-plate clones of the same strain (i.e., Different colonies on the same plate)?

Brown/Allard: Since understanding the natural genetic variation present in foodborne pathogens is the basis to understanding relatedness, the FDA conducted validation experiments on growing then sequencing colonies from the same plate, colonies from frozen inocula, thawing and plating, as well as running the same DNAs on different instruments and with different sequencing technicians. The FDA’s work with Salmonella enterica Montevideo sequencing as well as ongoing proficiency testing among laboratories shows that the same isolate most often has no differences, although some samples have 1-2 SNP differences. Genetic differences observed in isolates collected by FDA inspectors all related to a common outbreak generally have more genetic differences, and this appears to be dependent on the nature of the facility and the length of time that the foodborne pathogen has been resident in the facility and the selective pressure to which the pathogen was exposed to in a range from 0–5 SNPs different.

Siragusa/Marshall: Regarding the use of WGS to track strains in a particular processing plant, is it possible that within that closed microenvironment that strains will evolve sufficiently so that it becomes unique to that source?

Brown/Allard: Yes, we have discovered multiple examples of strains that have evolved in a unique way that they appear to be specific to that source. Hospitals use the same practice to understand hospital-acquired infections and the routes of transmission within a hospitals intensive care unit or surgery. Food industry laboratories as well as FDA investigators could use WGS data in a similar way to determine the root cause of the contamination by combining WGS data with inspection and surveillance. The FDA Office of Compliance uses WGS as one piece of evidence to ask the question: Have we seen this pathogen before?

Siragusa/Marshall: The number of sequences in the GenomeTrakr database is approaching 120,000 (~4,000 per month are added). Are the sequences in the GenomeTrakr database all generated by GenomeTrakr Network labs?

Brown/Allard: The sequences labeled as GenomeTrakr isolates at the NCBI biosample and bioproject databases are the WGS efforts supported by the U.S. FDA and USDA FSIS. GenomeTrakr is a label identifying the FDA, USDA FSIS and collaborative partner’s efforts to sequence food and environmental isolates. Additional laboratories, independent and beyond formal membership in the GT network, upload WGS data to the NCBI pathogen detection website of which GenomeTrakr is one part. CDC shares WGS data on primarily clinical PulseNet isolates and USDA FSIS shares WGS foodborne pathogens for foods that they regulate. Numerous international public health laboratories also upload WGS data to NCBI. The NCBI pathogen detection website includes all publicly released WGS data for the species that they are analyzing, and this might include additional research or public health data. The point of contact for who submitted the data is listed in the biosample data sheet, an example of which can be seen here.

Siragusa/Marshall: Once sequences are deposited into the GenomeTrakr database, are they also part of GenBank?

Brown/Allard: The majority of the GenomeTrakr database is part of the NCBI SRA (sequence read archive) database, which is a less finished version of the data in GenBank. GenBank data is assembled and annotated, which takes more time and analysis to complete. Once automated software is optimized and validated, NCBI likely will place all of the GenomeTrakr data into GenBank. Currently, only the published WGS data from GenomeTrakr is available in GenBank. All of the GenomeTrakr data is available in SRA both at GenomeTrakr bioprojects and in the NCBI pathogen detection website.

Readers, look for the Part II of this column where we continue our exploration with Drs. Brown and Allard and ask some general questions about the logistics surrounding GenomeTrakr. As always, please contact either Greg Siragusa or Doug Marshall with comments, questions or ideas for future Food Genomics columns.

About the Interviewees

Marc W. Allard, Ph.D.

Marc Allard, FDAMarc Allard, Ph.D. is a senior biomedical research services officer specializing in both phylogenetic analysis as well as the biochemical laboratory methods that generate the genetic information in the GenomeTrakr database, which is part of the NCBI Pathogen Detection website. Allard joined the Division of Microbiology in FDA’s Office of Regulatory Science in 2008 where he uses Whole Genome Sequencing of foodborne pathogens to identify and characterize outbreaks of bacterial strains, particularly Salmonella, E. coli, and Listeria. He obtained a B.A. from the University of Vermont, an M.S. from Texas A&M University and his Ph.D. in biology in from Harvard University. Allard was the Louis Weintraub Associate Professor of Biology at George Washington University for 14 years from 1994 to 2008. He is a Fellow of the American Academy of Microbiology.

Eric W. Brown, Ph.D.

Eric Brown, FDAEric W. Brown, Ph.D. currently serves as director of the Division of Microbiology in the Office of Regulatory Science. He oversees a group of 50 researchers and support scientists engaged in a multi-parameter research program to develop and apply microbiological and molecular genetic strategies for detecting, identifying, and differentiating bacterial foodborne pathogens such as Salmonella and shiga-toxin producing E. coli. Brown received his Ph.D. in microbial genetics from The Genetics Program in the Department of Biological Sciences at The George Washington University. He has conducted research in microbial evolution and microbial ecology as a research fellow in the National Cancer Institute, the U.S. Department of Agriculture, and as a tenure-track Professor of Microbiology at Loyola University of Chicago. Brown came to the Food and Drug Administration in 1999 and has since carried out numerous experiments relating to the detection, identification, and discrimination of foodborne pathogens.