Modification of the Stutter Position Label-Filtering Macro in the PE Biosystems Genotyper Version 2.5 Software Package: Resolution of Stutter-Filter Back Talk, by Kinsey and Hormann (Forensic Science Communications, July 2000)
July 2000 - Volume 2 - Number 3
Modification of the Stutter Position Label-Filtering
Macro in the PE Biosystems Genotyper® Version 2.5 Software Package:
Resolution of Stutter-Filter Back Talk
Philip T. Kinsey
Forensic Services Division
Oregon State Police
One artifact of the polymerase chain reaction (PCR) amplification of short tandem repeat (STR) loci is stutter. Stutter has been proposed to occur from a slipping of the polymerase during replication of a region of repeated sequence (Schlotterer and Tautz 1992; Walsh et al. 1996) that results in a small percentage of the amplicons measuring one repeat unit smaller than the true size of the DNA fragment amplified. To identify peaks at stutter positions resulting from the stutter phenomenon, validation studies of single-source samples were conducted to empirically determine the level of stutter produced at each STR locus. A feature of the Genotyper® software package (PE Biosystems, Foster City, California) allows customization of allele-labeling functions to incorporate such empirically derived data. Stutter-filter thresholds (or cut-off values) derived from the validation studies were programmed into Genotyper® to remove (or filter) allele labels from peaks meeting the criteria for stutter. During analysis, peaks in stutter positions that exceeded programmed stutter-filter thresholds but were not labeled were occasionally detected. This phenomenon was also detected with the PE Biosystems preset stutter-filter thresholds and was therefore not due to the incorporation of the empirically derived data. Investigation of the software macros used for filtering allele labels revealed unexpected effects termed “Stutter-Filter Back Talk.” A discussion of these effects and a simple remedy follow.
The Oregon State Police Forensic Laboratory DNA Analysis Unit uses the PE Biosystems AmpFlSTR Profiler Plus™ and COfiler™ Amplification Kits with ABI PRISM® 310 Genetic Analyzer instruments in analysis of the Combined DNA Index System (CODIS) core STR loci listed in Table 1. The STR loci characterized with these amplification kits are identified by size and by the fluorescent tag associated with specific amplification primers. The amplicons are separated on the basis of size by capillary electrophoresis on ABI PRISM® 310 instruments. As the DNA molecules traverse a quartz window section of the capillary, laser light excites the fluorescent tags. The resulting emitted light is captured in real time by a charge-coupled device (CCD) camera and is depicted as a set of overlapping electropherograms. The heights of the peaks displayed in the electropherogram indicate the relative intensity of the emitted fluorescent light (measured as relative fluorescent units [RFUs]), which can be correlated to the ratio of DNA fragments present. The analyst sets parameters within the ABI PRISM® 310 Collection and GeneScan® software packages to determine what electropherogram signals constitute DNA signals. Once the data is analyzed within the set parameters, it is imported into the Genotyper® software. Among other tasks, Genotyper® uses a macro (a sequence of commands or steps that perform a particular analysis procedure) to assign allele designations from peak base pair values (Genotyper® User’s Manual).
Briefly, the “Kazam” macro provided with the Genotyper® program works by labeling all peaks in a category (or locus) and then filtering (or removing) the labels from peaks, such as those in stutter positions, that meet predefined criteria. The criteria by which peaks at stutter positions are evaluated and programmed into the Genotyper® macro by the analyst are based on proximity to the true alleles (0-5 nucleotides smaller) and on internal validation studies of observed levels of stutter from single-source samples. The macro analyzes the loci in the following order: D3S1358, vWA, FGA, amelogenin, D8S1179, D21S11, D18S51, D5S818, D13S317, and D7S820 for the Profiler Plus™ loci and D3S1358, D16S539, amelogenin, TH01, TPOX, CSF1PO, and D7S820 for the COfiler™ loci.
The presence of stutter peaks is an artifact of the PCR amplification of STR loci (Walsh et al. 1996). The degree of stutter at a particular locus is thought to be due to a variety of factors that include, but are not limited to, the nucleotide sequence of the repeat region and the length of the amplicon (Schlotterer and Tautz 1992; Walsh et al. 1996; Levinson and Gutman 1987). The ability to identify peaks in stutter positions as stutter has important implications in the forensic characterization of unknown samples—particularly those composed of DNA mixtures in which the percentage of the mixture attributed to the minor contributor is near the level of stutter expected from the major contributor. Levels of stutter derived from internal validation studies were converted into stutter-filter percentages for incorporation into the allele-labeling macros of Genotyper®. As the DNA analysis unit transitioned from the analysis of STR loci from single-source convicted offender samples to casework samples that frequently consisted of DNA mixtures, it was observed that peaks at stutter positions exceeding stutter threshold levels were not being labeled as expected.
Reagents and instrumentation were obtained from PE Biosystems, Foster City, California, unless otherwise indicated. DNA was isolated from a bloodstained piece of carpet using the Chelex® reagent (BioRad, Richmond, California) in conjunction with the extraction protocol for bloodstains (AmpFlSTR Profiler Plus™ PCR Amplification Kit User’s Manual). The DNA was concentrated by spin dialysis in a Centricon® 100 device (Amicon, Beverly, Massachusetts) and quantitated using the Aces® 2.0+ Human Quantitation System (Life Technologies GIBCOBRL, Rockville, Maryland). Two nanograms of DNA from the Centricon® retentate were amplified in a 50-microliter reaction using the AmpFlSTR Profiler Plus™ Amplification Kit in a DNA Thermal Cycler 480 according to manufacturer’s recommended procedures. The analysis instrument used in this study was an ABI PRISM® 310 Genetic Analyzer coupled to a Macintosh computer.
Validation studies designed to ascertain levels of amplification stutter at each of the CODIS loci were performed on 73 single-source samples using the AmpFlSTR Profiler Plus™ and COfiler™ Amplification Kits (Bailey-Darland, 2000). Levels of amplification stutter were determined by dividing the peak-height values of peaks in stutter positions by those values of the corresponding true alleles (one repeat unit larger). Stutter percentages used as labeling cut-off values were determined by using the larger of either the maximum stutter percentage observed for a particular locus or the average stutter percentage plus three standard deviations.
Figure 1A depicts the partial electropherogram of a DNA mixture. The maximum allowable levels of stutter (stutter thresholds) for these STR loci are, from internal validation studies, 8 percent for the D8S1179 and D21S11 loci and 16 percent for the larger amplicons of the D18S51 locus (Table 1). At first glance, the unlabeled peak at the D8S1179-12 position appeared to be near the level of stutter expected for the D8S117-13 peak. Closer examination of the peak heights revealed that of D8S117-12 (325 RFUs) to be 14.9 percent of D8S117-13 (2174 RFUs), which exceeds the 8 percent stutter threshold of the locus. This condition should have resulted in an allele designation being assigned to the D8S117-12 peak (see Figure 2). The sample was analyzed using an unedited Genotyper® macro (Kazam) with stutter-filter percentages preset by PE Biosystems, and the same result (Figure 1B) was obtained. No data in this sample were off-scale, which may have resulted in artificially high stutter-position peaks.
After confirmation of the stutter-filter calculations and the programming of the particular macro steps (Genotyper® User’s Manual, 1998), the macro steps used in assigning allele designations were applied individually while the labeling of the peak in question was monitored. In doing so for the above example, it was observed that the D8S117-12 allele was originally labeled per the D8S117 locus specifications. However, as the test continued and the steps specific to the D18S51 locus were run, the D8S117-12 allele designation was removed. Knowing that the stutter threshold for the D18S51 locus was set at 16 percent, it was considered that the D18S51 stutter-filter criteria were being applied to the D8S117 locus. The experiments that follow were performed to test this hypothesis.
To test whether the D8S117-12 peak could be labeled by altering the stutter threshold level of the D18S51 locus, the D18S51 stutter threshold level was reduced from 16 percent to 14 percent (below the 14.9 percent of the D8S117-12/-13 ratio). This macro was then applied to the sample mentioned above. The peak at D8S117-12 was labeled (Figure 2). It therefore appeared that the stutter filter of the D18S51 locus, analyzed subsequently, was applied to the previously analyzed D8S117 locus.
In order to determine whether this phenomenon was specific to the two loci above, the stutter filter of the D21S11 locus, analyzed after D8S117 and before D18S51, was reprogrammed from 8 percent to 15 percent. The D18S51 locus stutter filter was left at the permissive 14 percent level. This macro was then applied to the sample with the result that the D8S117-12 label was once again removed (Figure 3). Thus, either of the D21S11 or D18S51 locus stutter filters appeared to function at the D8S117 locus. Moreover, once the D8S117-12 allele label was removed by the restrictive filter (D21S11 set at 15 percent), it was not replaced by a subsequent permissive filter (D18S51 set at 14 percent).
Next, a different locus was tested for its susceptibility to stutter-filter percentages of subsequently analyzed loci. The alleles being tested were D21S11-29 and -30, with RFU values of 246 and 1420 respectively, resulting in a D21S11-29 peak height 17.3 percent that of D21S11-30. The peak was labeled in all previous macro tests (Figures 1-3). The D8S117 and D21S11 loci stutter filters were set back to 8 percent, and the D18S51 locus stutter filter was set at a presumably restrictive 18 percent. This macro was then applied to the sample and resulted in the D8S117-12 and D21S11-29 labels being removed (Figure 4). Thus, both the D8S117 and D21S11 loci are susceptible to a restrictive stutter filter of a subsequently analyzed locus.
In the next experiment, the D8S117-12 peak label was considered in the context of a stutter filter from a locus even further removed. The loci amplified with the Profiler Plus™ Amplification Kit are represented in three color sets: D3S1358, vWA, and FGA in blue; amelogenin, D8S117, D21S11, and D18S51 in green; and D5S818, D13S317, and D7S820 in yellow. Here, the D8S117 and D21S11 loci stutter filters were set at 8 percent, and the D18S51 locus stutter filter was set at the permissive level of 14 percent. In addition, the stutter filter of the D5S818 locus, analyzed subsequently to the D8S117, D21S11, and D18S51 loci, was set to a restrictive 15 percent. The test was run, and the D8S117-12 peak was labeled (Figure 5). Therefore, restrictive stutter filters of subsequent color sets are not applied to previously analyzed loci.
Thus far, all experiments have been concerned with restrictive stutter filters being applied to previously analyzed loci. To determine whether restrictive stutter filters were applied to subsequently analyzed loci, the following conditions were tested. The D21S11-29 peak label was considered with the D21S11 and D18S51 loci stutter filters set at the permissive 8-percent level and with the D8S117 locus stutter filter set at a restrictive 18 percent. This macro was then applied to the sample and resulted in the D21S11-29 peak remaining labeled (Figure 6). Hence, restrictive stutter filters do not appear to function at loci analyzed subsequently. Herein lies a solution to the problem.
The section of the macro involved in labeling peaks of the D18S51 locus was moved to a position in which its analysis would precede those of the D8S1179 and D21S11 loci (Daniel Petersen, personal communication, October 19, 1999). With the appropriate macro selected and the “Step window” open, the lines from “Select category: D18S51” to “Unmark selected categories,” inclusive, were cut and pasted upstream of the “Select category: D8S1179” line using the standard “Edit” functions. All stutter filters were set back to the original settings of 8 percent for the D8S1179 and D21S11 loci and 16 percent for the D18S51 locus. This modification of the macro was applied to the sample and resulted in the D8S1179-12 peak being labeled as originally intended (Figure 7).
Using the custom macro, electropherograms of samples in which peaks were not labeled as expected on the basis of descriptions in the Genotyper® User’s Manual (1998) were encountered. This phenomenon was also detected with the PE Biosystems preset stutter-filter percentages and so is not a result of incorporating the empirically derived stutter-filter percentages. That it took several months to discover the issue stems from the relatively small number of peaks at stutter positions that exceed the stutter percentage values at their loci and fall below the stutter percentage values of subsequently analyzed loci. For example, the range in which stutter-peak labels would be incorrectly filtered for the vWA locus would be between the 10-percent level for the vWA locus and the 12-percent level of the FGA locus. The problem was originally detected with stutter filters set at 8 percent for the D8S1179 locus and 16 percent for the D18S51 locus, a much larger window of opportunity. This phenomenon was described, and a modification of the process that will accomplish the expected stutter-peak label filtering was demonstrated.
According to the stutter validation studies and the results reported here, it was expected to eventually encounter similar problems with the vWA locus amplified with the Profiler Plus™ Amplification Kit and with the TH01 and TPOX loci amplified with the COfiler™ Amplification Kit. The above hypothesis was tested at these loci and confirmed (data not shown), prompting similar changes in the order in which the allele labels of those loci are filtered (Table 2). The loci are currently analyzed in the following order for the Profiler Plus™ loci: D3S1358, FGA, vWA, amelogenin, D18S51, D8S1179, D21S11, D5S818, D13S317, and D7S820; and D3S1358, D16S539, amelogenin, CSF1PO, TH01, TPOX, and D7S820 for the COfiler™ loci.
Current thinking places the macro line, “Select <color lanes>,” which follows as the second line after each “Select category: <locus name>” line, as suspect in stutter-filter back talk. It appears that as each locus within a color set is analyzed in the macro, labeled peaks at stutter positions are subjected in series to the stutter filters of subsequently analyzed loci. To use the examples from above (Figures 1 through 7), once the peaks of the D8S1179 locus have been labeled and then filtered according to the parameters of the D8S1179 locus, they are then filtered along with the D21S11 peaks, according to the parameters of the D21S11 locus. Because the programmed stutter percentages for these loci are the same, labeling was not affected and the D8S1179-12 label remained. However, when the D18S51 locus is analyzed, the presumption is that the “Select green lanes” line subjects all of the peaks (D8S1179, D21S11, and D18S51) to the D18S51 filtering parameters resulting in a loss of the D8S1179-12 label. Contrarily, in the modified macro, stutter-position peak labels are filtered with identical or progressively lower percentage filters resulting in peaks that when once labeled, remain labeled.
Other modifications to the current Genotyper® macros were incorporated. The TH01 and TPOX filters were converted from general locus (based on a percentage of the greatest peak height within a category) to stutter-position specific filters. Stutter-filter back talk was also detected at the TH01 and TPOX loci (with general locus filters) and was attributed to the CSF1PO stutter-position specific filter setting (data not shown). An additional caveat of the general locus filter is that minor peaks at nonstutter positions within a locus are also subject to label filtering. That the amelogenin amplicons are separated by more than the stutter-position window distance for the STR loci (0-5 bp) precludes them from being influenced by stutter-filter back talk. Instances were encountered in which Y alleles from minor contributors were not labeled even though their peak heights exceeded the minimum peak-height threshold of 50 RFUs. These cases were shown to result from the preset, general-locus filter setting of 3 percent. An attempt to resolve this peak-labeling issue by reducing the filter setting to 1 percent of the highest peak in the category resulted in shoulders in the minus-one nucleotide position being frequently labeled. Finally, a combination of two filter types was employed successfully. A general-locus filter setting of 1 percent allows Y alleles from very minor contributors to be labeled, and a stutter-position specific filter of 3 percent (a range of 0 to 5 bp) prevents low-level shoulders in the minus-one nucleotide position from being labeled.
Narrowing the stutter-filter window to prevent adjacent off-ladder allele peaks from being filtered or used as the comparison peaks for the filtering of peak labels found within a repeat unit in length is being pursued. Applying the stutter filter twice to those loci analyzed last within a color set is also being pursued. This would provide the desired filtering of labels from peaks at stutter positions from alleles separated by less than a full repeat unit (e.g., D18S51-33, 33.2). Regardless of modifications made to the Genotyper® macros, the software should not be relied upon as the only source of information in making allele calls. Finally and most importantly, although modifications such as those described here will aid in reducing the number of peaks unintentionally labeled or filtered by the software, thorough validation of the software combined with vigilant and diligent critical analysis must be carried out to ensure accurate and complete genotyping.
AmpFlSTR Profiler Plus™ PCR Amplification Kit User’s Manual. PE Biosystems, Foster City, California, 1997.
Bailey-Darland, C. M. Validation of Polymerase Chain Reaction Analysis of Short Tandem Repeat Loci for Casework within the Oregon State Police Forensic Laboratory. Master’s Thesis in preparation, Portland State University, Portland, Oregon, 2000.
Genotyper® User’s Manual, Chapter 4—Automated Genotyping Procedures and Chapter 6—Defining Categories and Labeling. PE Biosystems, Foster City, California, 1998.
Levinson, G. and Gutman, G. Slipped-strand mispairing: A major mechanism for DNA sequence evolution, Molecular Biology and Evolution (1987) 4(3):203-221.
Schlotterer, C. and Tautz, D. Slippage synthesis of simple sequence DNA, Nucleic Acids Research (1992) 20(2):211-215.
Walsh, P. S., Fildes, N. J., and Reynolds, R. Sequence analysis and characterization of stutter at the tetranucleotide repeat locus vWA, Nucleic Acids Research (1996) 43:854-870.