|
DNA Match and Profile Probabilities:
Comment on
Budowle et al. (2000) and Fung and Hu (2000)
Bruce S. Weir
Professor
Program in Statistical Genetics, Department of Statistics
North Carolina State University
Raleigh, North Carolina
Abstract.......Introduction.......Source
Attribution: Budowle et al. (2000)
Mixtures:
Fung and Hu (2000).......Conclusion.......References
Abstract
Two recent papers in this
journal contain errors in statistical methods for interpreting
DNA profiles. In each case the errors could have been avoided
by making a distinction between one-person profile probabilities
and two-person match probabilities.
Introduction
There is now widespread acceptance
that failure to exclude a person from the DNA profile of an evidentiary
sample can provide strong evidence in support of the proposition
that the person was a contributor to the sample. Providing a
large number of loci are used, the evidence is strong whether
the profile had one or more than one contributor. Two recent
papers (Budowle et al. 2000, Fung and Hu 2000) in this journal,
however, suggest that there is less widespread acceptance of
appropriate ways of quantifying the strength of the evidence.
A problem with both these papers is that they do not distinguish
between profile probabilities and match probabilities.
It is very helpful to use
the term "profile probability" for the chance of a
single individual having a particular profile, in distinction
to "match probability" for the chance of a person having
the profile when it is known that another person has the profile.
The match probability, therefore, explicitly requires statements
about two profiles. Profile probabilities are of some interest
but are unlikely to be relevant in forensic calculations. If
an evidentiary profile is known to be that of the perpetrator
of a crime and a person found to have that profile becomes the
defendant in a trial, then the numerical values given in court
are derived under the proposition that the defendant is not the
perpetrator. It is of little consequence that the profile is
rare in the populationwhat is relevant is the rarity of
the profile, given that one person (e.g., the perpetrator) has
the profile. In other words, what is the probability that the
defendant would have the profile given that the perpetrator has
the profile and these are different people? This number is the
match probability, and it is seen to be a conditional probability.
The distinction between profile
and match probabilities is rarely made by practicing forensic
scientists, and this is most likely because the two quantities
have the same value in the simple case when "product rule"
calculations are valid. If there is no relatedness in a large
population, due to either immediate family membership or common
evolutionary history, and there is completely random mating and
population homogeneity, and an absence of linkage, selection,
mutation, and migration, then all the alleles in a DNA profile
are independent. The profile probability and the match probability
are both just the product of the allele probabilities, together
with factors of two for each heterozygous locus. The papers of
Budowle et al. (2000) and Fung and Hu (2000) treated cases where
the product rule does not apply, and then the distinction becomes
critical.
Source Attribution: Budowle et al. (2000)
Budowle et al. (2000) developed
a criterion "to assess the question of uniqueness"
of a DNA profile. They used the term "random match probability,"
with symbol px , for profile X, in a
way that suggests they mean the profile probability. They point
out that the probability that none of N unrelated individuals
has the profile is (1 px)N.
However, they stated that px is calculated
according to the 1996 National Research Council Report [NRC-II]
formulae 4.1b and 4.4a for general population scenarios and formula
4.10 under the assumption that the contributor and the accused
could only come from one subgroup.
The problem with using formula
4.4a, a profile probability, is that it is designed for a structured
population. The same evolutionary forces that cause alleles within
single genotypes to be dependent, to an extent indicated by the
parameter q, also cause alleles between genotypes
to be dependent. Profile probabilities are not independent in
structured populations, and independence is required for the
use of the binomial result (1 px)N.
The problem with using formula 4.1b, the product rule result
for heterozygotes, is that some heterozygotes may have higher
probabilities in structured populations, and this will not be
known in a particular case. The problem with using formula 4.10
is that this gives conditional probabilities, and these cannot
be treated as though they are profile probabilities. In particular,
formula 4.10 explicitly states that profiles are not independent
in a population so that the equation (1 px)N
cannot be appropriate.
Mixtures:
Fung and Hu (2000)
Fung and Hu (2000) wished
to perform calculations for stains having multiple contributors
with the use of the profile probability in NRC II formula 4.4a
"because of its simplicity." They multiplied these
probabilities over all the contributors to the stain, in clear
violation of the assumption of allelic dependence made in that
formula. If alleles are dependent within individuals, for reasons
"such as relatedness and population subdivision" they
are also dependent between individuals. The profile probabilities
should not be multiplied together. All the dependencies were
taken into account in the treatment of Curran et al. (1999).
When there are no dependencies, it is appropriate to use product-rule
profile probabilities, as shown by Weir et al. (1997).
Fung and Hu make a further
error when considering relatives. They propose multiplying together
the conditional probabilities for relatives (e.g., the chance
a person has a profile given that his brother has that profile)
described by NRC II. They treat conditional probabilities in
the same way as they treated profile probabilities, so they ignore
the fundamental difference between the two as well as the dependencies
among conditional probabilities.
Conclusion
A forensic scientist may
well decide not to present any numerical testimony in a DNA case
and simply state that the failure to exclude a defendant from
an evidentiary profile based on many loci provides very powerful
evidence. However, if that statement rests on false numerical
calculations then the cause of forensic science is not advanced.
It may be that the false calculations are conservative, but this
is not a scientific basis. There is also the danger that they
are not conservative. In an age when profile probability calculations
are performed by computer, there is no justification for violating
population genetic theory simply to invoke simple equations.
References
Budowle, B., Chakraborty,
R., Carmody, G., and Monson, K. L. Source attribution of a forensic
DNA profile, Forensic Science Communications (July 2000).
Available at: www.fbi.gov/programs/lab/fsc/backissu/july2000/source.htm
Curran, J. M., Triggs, C.
M., Buckleton, J. S., and Weir, B. S. Interpreting DNA mixtures
in structured populations, Journal of Forensic Sciences
(1999) 44:987995.
Fung, W. K. and Hu, Y-Q.
Interpreting DNA mixtures based on the NRC-II Recommendation
4.1, Forensic Science Communications (October 2000). Available
at: www.fbi.gov/programs/lab/fsc/backissu/oct2000/fung.htm
National Research Council
(NRC-II). The Evaluation of Forensic DNA Evidence. National
Academy Press, Washington, DC, 1996.
Weir, B. S., Triggs, C. M.,
Starling, L., Stowell, L. I., Walsh, K. A. J., and Buckleton,
J. Interpreting DNA mixtures, Journal of Forensic Sciences
(1997) 42:213222.
To the reply by Budowle, Chakraborty,
Carmody, and Monson
To
the reply by Fung and Hu
Top of the page
|