Producers of food-based products are faced with challenges of maintaining the safety and quality of their products, while also managing rapid screening of raw materials and ingredients. Failure to adequately address both challenges can be costly, with estimated recall costs alone starting around $10 million, in addition to any litigation costs.1 Long-term costs can accumulate further as a result of damage to brand reputation. A vast array of methods has been employed to meet these challenges, and adoption continues to increase as technology becomes smaller, cheaper and more user friendly. One such technique is Fourier transform infrared (FTIR) spectroscopy, an analytical technique that is widely used for quick (typically 20–60 seconds per measurement) and non-destructive testing of both man-made and natural materials in food products. The uniformity and physical state of the sample (solid vs. liquid) will dictate the specifics of the hardware used to perform such analyses, and the algorithm applied to the identification task will depend, in part, on the expected variability of the ingredient.
Infrared spectral measurements provide a “compositional snapshot”— capturing information related to the chemical bonds present in the material. Figure 1 shows an example of a mid-infrared spectrum of peppermint oil. Typically, the position of a peak along the x-axis (wavenumber) is indicative of the type of chemical bond, while the peak height is related either to the identity of the material, or to the concentration of the material in a mixture. In the case of peppermint oil, a complex set of spectral peaks is observed due to multiple individual naturally occurring molecular species in the oil.
Once the infrared spectrum of an ingredient is measured, it is then compared to a reference set of known good ingredients. It is important that the reference spectrum or spectra are measured with ingredients or materials that are known to be good (or pure)—otherwise the measurements will only represent lot-to-lot variation. The comparative analysis can assist lab personnel in gaining valuable information—such as whether the correct ingredient was received, whether the ingredient was adulterated or replaced for dishonest gain, or whether the product is of acceptable quality for use. The use of comparative algorithms for ingredient identification also decreases subjectivity by reducing the need for visual inspection and interpretation of the measured spectrum.
Correlation is perhaps the most widely used algorithm for material identification with infrared spectroscopy and has been utilized with infrared spectra for identification purposes at least as early as the 1970s.2 When using this approach, the correlation coefficient is calculated between the spectrum of the test sample and each spectrum of the known good set. Calculated values will range from 0, which represents absolutely no match (wrong or unexpected material), to 1, representing a perfect match. These values are typically sorted from highest to lowest, and the material is accepted or rejected based on whether the calculated correlation lies above or below an identified threshold. Due to the one-to-one nature of this comparison, it is best suited to identification of materials that have little or no expected variability. For example, Figure 2 shows an overlay of a mid-infrared spectrum of an ingredient compared to a spectrum of sucrose. The correlation calculated between the two spectra is 0.998, so the incoming ingredient is determined to be sucrose. Figure 3 shows an overlay of the same mid-infrared spectrum of sucrose with a spectrum of citric acid. Notable differences are observed between the two spectra, and a significant change in the correlation is observed, with a coefficient of 0.040 calculated between the two spectra. The citric acid sample would not pass as sucrose with the measurement and algorithm settings used in this example.
When testing samples with modest or high natural variability, acceptable materials can produce a wider range of infrared spectral features, which result in a correspondingly broad range of calculated correlation values. The spread in correlation values could be of concern as it may lead to modification of algorithm parameters or procedures to “work around” this variation. Resulting compromises can increase the potential for false positives, meaning the incorrect ingredient or adulterated material might be judged as passing. Multivariate algorithms provide a robust means for evaluating ingredient identity for samples with high natural variability.
Click below to continue to page 2.
An algorithm known as soft independent modeling of class analogy, or SIMCA is often applied for identification of ingredients where modest-to-high natural variation is expected.3 A “training set” of known good materials is assembled and covers a wide range of the ingredient’s expected variability. The algorithm utilizes the spectral data in the “training set” like the way the human mind interprets photographs. When looking at a photograph, an observer’s eyes “clue in” on objects of interest, also using the quantity of these objects to understand the context of the photo. For example, a photo with many buildings, cars and people would be judged as an urban setting, whereas a photo devoid of all three would be classified as a rural setting. Likewise, the SIMCA algorithm breaks down the spectral data in the training set into types of spectral changes, referred to as “principal components” or “factors”, and relative magnitude of spectral changes, referred to as “scores”. Boundaries are established based on the range of scores in the training set, as shown graphically in Figure 4. The green sphere in Figure 4 represents the acceptable range of sample variation, and each dark green dot is one known-good sample spectrum. Each of the three axes represents a principal component, or type of spectral change, and the distance along each axis represents the corresponding score, or relative magnitude of the principal component. The number of principal components required for a robust method will depend on the degree of variation in the ingredient, with more highly variable ingredients requiring more principal components to describe the types of chemical changes occurring in the material.
The SIMCA algorithm provides added benefits when applied in this setting. Well-developed multivariate methods utilize genuine spectral signatures in discrimination of ingredients, with less sensitivity to random measurement noise. Additionally, two diagnostic values can be calculated from this algorithm, which can indicate the cause for an ingredient failing the test: Model distance and residual distance. A high model distance value tells the analyst that the tested ingredient has spectral features (and therefore chemical composition) like that of the samples in the training set but is simply “out-of-spec”. As an example, a milk powder sample might fail with high model distance if the ingredient had taken on excessive moisture during shipping or handling. A high residual distance indicates the presence of spectral features that are dissimilar to those in the training set, which might signal the appearance of foreign material, such as an adulterant.
Capabilities of the SIMCA algorithm can be further extended with an approach like that of the spectrum adulterant screen. This algorithm will calculate an expected spectrum for the ingredient based on the known-good sample spectra in the user’s SIMCA method and calculate the difference between the tested ingredient spectrum and the expected spectrum for that ingredient. Next, the expected spectrum is calculated with spectra of potential or likely adulterants added to the data, and the process is repeated iteratively for each adulterant spectrum. This process is illustrated in Figure 5.
When testing good, unadulterated material, the inclusion of the adulterant spectrum in the calculated spectrum will decrease the similarity between the material spectrum and calculated spectrum. This causes the residual to increase, which is an indicator that the adulterant is not present at significant levels in the material. The increased residual is an indicator that the adulterant is not present at significant levels in the material. However, if the residual decreases when the spectrum of an adulterant is added to the calculated spectrum, this may indicate the presence of the adulterant in the tested sample. Contribution of each principal component and the adulterant spectrum to the calculated spectrum is scaled via a least-squares fit, which also provides an approximate concentration of a detected adulterant.
Food fraud and adulteration detection is increasingly performed with FTIR instrumentation due to decreasing cost and footprint of the technology, as well as software and algorithmic advances, which provide robust means for discriminating materials, with ever-increasing user-friendliness. This technique, in combination with statistical models, provides rapid and non-destructive compositional screening with little to no sample preparation. Moreover, the technique offers non-targeted analysis, with the ability to detect many foreign or incorrect materials based on their spectral signature. As such, FTIR is a fantastic first line of defense when protecting your company’s brand reputation.