DNA Match and Profile Probabilities: Comment on Budowle et al. (2000) and Fung and Hu (2000), by Weir (Forensic Science Communications, January 2001)
January 2001 - Volume 3 - Number 1
Comments and Replies
DNA Match and Profile Probabilities: Comment on
Budowle et al. (2000) and Fung and Hu (2000)
Bruce S. Weir
Program in Statistical Genetics, Department of Statistics
North Carolina State University
Raleigh, North Carolina
Two recent papers in this journal contain errors in statistical methods for interpreting DNA profiles. In each case the errors could have been avoided by making a distinction between one-person profile probabilities and two-person match probabilities.
There is now widespread acceptance that failure to exclude a person from the DNA profile of an evidentiary sample can provide strong evidence in support of the proposition that the person was a contributor to the sample. Providing a large number of loci are used, the evidence is strong whether the profile had one or more than one contributor. Two recent papers (Budowle et al. 2000, Fung and Hu 2000) in this journal, however, suggest that there is less widespread acceptance of appropriate ways of quantifying the strength of the evidence. A problem with both these papers is that they do not distinguish between profile probabilities and match probabilities.
It is very helpful to use the term "profile probability" for the chance of a single individual having a particular profile, in distinction to "match probability" for the chance of a person having the profile when it is known that another person has the profile. The match probability, therefore, explicitly requires statements about two profiles. Profile probabilities are of some interest but are unlikely to be relevant in forensic calculations. If an evidentiary profile is known to be that of the perpetrator of a crime and a person found to have that profile becomes the defendant in a trial, then the numerical values given in court are derived under the proposition that the defendant is not the perpetrator. It is of little consequence that the profile is rare in the population—what is relevant is the rarity of the profile, given that one person (e.g., the perpetrator) has the profile. In other words, what is the probability that the defendant would have the profile given that the perpetrator has the profile and these are different people? This number is the match probability, and it is seen to be a conditional probability.
The distinction between profile and match probabilities is rarely made by practicing forensic scientists, and this is most likely because the two quantities have the same value in the simple case when "product rule" calculations are valid. If there is no relatedness in a large population, due to either immediate family membership or common evolutionary history, and there is completely random mating and population homogeneity, and an absence of linkage, selection, mutation, and migration, then all the alleles in a DNA profile are independent. The profile probability and the match probability are both just the product of the allele probabilities, together with factors of two for each heterozygous locus. The papers of Budowle et al. (2000) and Fung and Hu (2000) treated cases where the product rule does not apply, and then the distinction becomes critical.
Budowle et al. (2000) developed a criterion "to assess the question of uniqueness" of a DNA profile. They used the term "random match probability," with symbol px , for profile X, in a way that suggests they mean the profile probability. They point out that the probability that none of N unrelated individuals has the profile is (1 – px)N. However, they stated that px is calculated according to the 1996 National Research Council Report [NRC-II] formulae 4.1b and 4.4a for general population scenarios and formula 4.10 under the assumption that the contributor and the accused could only come from one subgroup.
The problem with using formula 4.4a, a profile probability, is that it is designed for a structured population. The same evolutionary forces that cause alleles within single genotypes to be dependent, to an extent indicated by the parameter q, also cause alleles between genotypes to be dependent. Profile probabilities are not independent in structured populations, and independence is required for the use of the binomial result (1 – px)N. The problem with using formula 4.1b, the product rule result for heterozygotes, is that some heterozygotes may have higher probabilities in structured populations, and this will not be known in a particular case. The problem with using formula 4.10 is that this gives conditional probabilities, and these cannot be treated as though they are profile probabilities. In particular, formula 4.10 explicitly states that profiles are not independent in a population so that the equation (1 – px)N cannot be appropriate.
Fung and Hu (2000) wished to perform calculations for stains having multiple contributors with the use of the profile probability in NRC II formula 4.4a "because of its simplicity." They multiplied these probabilities over all the contributors to the stain, in clear violation of the assumption of allelic dependence made in that formula. If alleles are dependent within individuals, for reasons "such as relatedness and population subdivision" they are also dependent between individuals. The profile probabilities should not be multiplied together. All the dependencies were taken into account in the treatment of Curran et al. (1999). When there are no dependencies, it is appropriate to use product-rule profile probabilities, as shown by Weir et al. (1997).
Fung and Hu make a further error when considering relatives. They propose multiplying together the conditional probabilities for relatives (e.g., the chance a person has a profile given that his brother has that profile) described by NRC II. They treat conditional probabilities in the same way as they treated profile probabilities, so they ignore the fundamental difference between the two as well as the dependencies among conditional probabilities.
A forensic scientist may well decide not to present any numerical testimony in a DNA case and simply state that the failure to exclude a defendant from an evidentiary profile based on many loci provides very powerful evidence. However, if that statement rests on false numerical calculations then the cause of forensic science is not advanced. It may be that the false calculations are conservative, but this is not a scientific basis. There is also the danger that they are not conservative. In an age when profile probability calculations are performed by computer, there is no justification for violating population genetic theory simply to invoke simple equations.
Budowle, B., Chakraborty, R., Carmody, G., and Monson, K. L. Source attribution of a forensic DNA profile, Forensic Science Communications (July 2000). Available at: www.fbi.gov/programs/lab/fsc/backissu/july2000/source.htm
Curran, J. M., Triggs, C. M., Buckleton, J. S., and Weir, B. S. Interpreting DNA mixtures in structured populations, Journal of Forensic Sciences (1999) 44:987–995.
Fung, W. K. and Hu, Y-Q. Interpreting DNA mixtures based on the NRC-II Recommendation 4.1, Forensic Science Communications (October 2000). Available at: www.fbi.gov/programs/lab/fsc/backissu/oct2000/fung.htm
National Research Council (NRC-II). The Evaluation of Forensic DNA Evidence. National Academy Press, Washington, DC, 1996.
Weir, B. S., Triggs, C. M., Starling, L., Stowell, L. I., Walsh, K. A. J., and Buckleton, J. Interpreting DNA mixtures, Journal of Forensic Sciences (1997) 42:213–222.