Reply to Weir (2001), by Budowle, Chakraborty, Carmody, and Monson (Forensic Science Communications, January 2001)
January 2001 - Volume 3 - Number 1
Comments and Replies
Reply to Weir (2001)
Federal Bureau of Investigation
Human Genetics Center
University of Texas School of Biomedical Sciences
Department of Biology
Keith L. Monson
Forensic Science Research Unit
Federal Bureau of Investigation
The comment by Weir (2001) criticizes the approach for assessing source attribution by Budowle et al. (2000) by suggesting that it is of little consequence that a profile is rare in the population and that because of relatedness in the population, the assumption of independence implicit in using the binomial (1 – px)N is inappropriate. Weir's criticisms fail to acknowledge a pragmatic application of a robust statistical model, the predictions of which comport with observations even though its assumptions are not strictly satisfied, and he also fails to recognize the low degree of relatedness in the populations typically encountered.
Weir (2001) states that our approach is inappropriate because the assumption of independence is violated in the population. In statistics, models often are used to help interpret data. These models are simplified representations of a phenomenon and do not exactly represent the real world. Weir's model for independence is a population with no common evolutionary history, complete random mating, population homogeneity, no linkage, no selection, no mutation, and no migration—in other words, an idealized Hardy–Weinberg population. Such a population does not exist. Yet, the Hardy–Weinberg model prevails and often is used to describe genetic markers. Even though complete independence is never realized, the genetic markers used in forensics generally meet Hardy–Weinberg expectations (NRC-II 1996). This is because the assumption of independence is a reasonable approximation for the genetic markers used in forensic human identity testing.
Both the NRC-II Report (1996) and the DNA Advisory Board (DAB) recommendations on statistics (DAB 2000) recognize that rarely is there only one statistical approach to interpret and explain evidence. The DAB recommendations state, "The choice of approach is affected by the philosophy and experience of the user, the legal system, the practicality of the approach, the question(s) posed, available data, and/or assumptions." Moreover, the DAB recognizes that simplistic and less rigorous approaches can be employed, as long as false inferences are not conveyed (DAB 2000). We wholeheartedly agree and therefore justify our approach (which actually was first proposed by the NRC-II Report ) as it is easy to understand and compute and is a demonstrably conservative approximation. Although it is true that no population meets the above-stated Hardy–Weinberg criteria, extant data support that population substructure has minimal effect on computations of profile frequencies. The NRC-II Report (1996) recommends using a q of 0.01 to correct for substructure; however, the true value of q is much lower than 0.01. Budowle et al. (in press) found that q (or FST) estimates over all 13 core CODIS STR loci are 0.0006 for African Americans, –0.0005 for U.S. Caucasians, 0.0021 for Hispanics, and 0.0039 for Asians. The population data on nine of the thirteen CODIS STR loci described by Chakraborty et al. (1999) were shown to have GST values at 0.000 for African Americans, 0.001 for Caucasians, and 0.001 for Asians (unpublished data).
Although Weir argues that source attribution should be based on computations of the conditional probability, the effect is of little consequence for forensically relevant populations. For example, Chakraborty et al. (1999) showed that even when considering the upper 95 percent confidence limit of the most common 13 STR locus profile in African Americans, the rarity of the profile changes from 1 in 100 × 109 to 1 in 72 × 109 if the conditional probability is used. Therefore, the assumption of independence has little practical consequence on such estimates.
However, we recognize that when using simplified models a degree of conservatism should be built into the analysis. First, the FBI uses a population size (N) of 260 million. Rarely, if ever, would a population of 260 million be meaningful in a forensic context. In principle, assessing source attribution should be considered within the context of the case. However, defining the true size of the potential population can be difficult. The large value of N adds a substantial buffer, by orders of magnitude, to the threshold estimate. Second, the rarity of the profile is estimated according to the recommendations in the NRC-II Report (1996). The frequency px includes a correction for deviations from Hardy–Weinberg expectations, using q, and is further corrected for sampling variation by multiplying the frequency by a factor of ten. As already stated, the value of q used is 0.01, even though realistic estimates of q are much smaller. Third, the threshold confidence level for opining source attribution is 0.99, resulting in a minimum threshold match probability of 1 in 2.6 × 1010 for a population of 260 million (Budowle et al. 2000). Rarely is this specific threshold value observed. The average match probability for a 13-locus STR profile, with adjustments for the effect of population substructure, ranges from less than 1 in 1012 (in Apaches) to 1 in 1015 (in major population groups; Chakraborty et al. 1999). Therefore, with the genetic typing tools used today, the degree of confidence typically is several orders of magnitude higher than 0.99.
For practical purposes, the need for a conditional probability logically applies only when the true contributor of the profile belongs to the same subpopulation as the suspect (i.e., shares a common evolutionary history). Rarely does such a situation occur. The probability of observing an extremely rare profile again would most likely be greater in a group of individuals with a common evolutionary history. Should such an occasion arise, Weir would presumably advocate a higher value for q and employ a conditional probability. Our approach for assessment of source attribution would be more conservative, even under this scenario. As q increases, logically the size of the subpopulation must decrease. To apply a realistic conditional probability, the effective size of the relevant population (N) would have to be substantially less than the currently used value of 260 million. Thus, the threshold frequency for opining source attribution would not be as high as currently used if a conditional probability were employed.
We conclude that there is support (including the NRC-II Report ) for use of our simple model, both from a practical point of view and from extant population data. Our approach is a reasonable approximation because of the low level of substructure in forensically relevant populations. There is nothing inappropriate about being conservative (NRC-II 1996). The threshold is conspicuously conservative and thus would not create any undue bias. In the end, we see no difference between our approach and that of Weir (1995) after his testimony in the O. J. Simpson trial where he opined, "Presentation of a number such as 1 in 57 billion suggests that it is inconceivable that the rear-gate profile ... would be found in a random individual (after all, there are only 5 billion people on the planet). Thus, the frequency ... in a population will be so low that the need for presenting probability numbers in cases where one identifiable profile is present appears to me to be superfluous."
Budowle, B., Chakraborty, R., Carmody, G., and Monson, K. L. Source attribution of a forensic DNA profile, Forensic Science Communications (July 2000). Available at: http://www.fbi.gov/programs/lab/fsc/backissu/july2000/source.htm
Budowle, B., Shea, B., Niezgoda, S., Chakraborty, R. CODIS STR Loci Data from 41 Sample Populations, Journal of Forensic Sciences (in press).
Chakraborty, R., Stivers, D. N., Su, B., Zhong, Y., and Budowle, B. The utility of STR loci beyond human identification: Implications for the development of new DNA typing systems, Electrophoresis (1999) 20:1682–1696.
DNA Advisory Board (DAB). Statistical and population genetics issues affecting the evaluation of the frequency of occurrence of DNA profiles calculated from pertinent population database(s) (approved February 23, 2000), Forensic Science Communications (July 2000). Available at: http://www.fbi.gov/programs/lab/fsc/backissu/july2000/dnastat.htm
National Research Council (NRC-II). The Evaluation of Forensic Evidence. National Academy Press, Washington, DC, 1996.
Weir, B. S. DNA match and profile probabilities: Comment on Budowle et al. (2000) and Fung and Hu (2000), Forensic Science Communications (January 2001). Available at: http://www.fbi.gov/programs/lab/fsc/current/weir.htm
Weir, B. S. DNA statistics in the Simpson matter, Nature Genetics (1995) 11:365–368.