Interpreting DNA Mixtures Based on the NRC-II Recommendation 4.1 (Forensic Science Communications, October 2000)
October 2000 - Volume 2 - Number 4 |
Interpreting DNA Mixtures Based on the NRC-II Recommendation 4.1
Wing K. Fung
Department of Statistics and Actuarial Science
University of Hong Kong
Hong Kong
Yue-Qing Hu
Department of Mathematics
Southeast University
Nanjing, China
Abstract | Introduction | NRC-II Recommendation 4.1 | Results | Theorem | Applications | Conclusion and Discussion | Appendix: Proof of the Theorem | Acknowledgment | References
Abstract
The interpretation of mixed DNA stains previously reported has largely focused on the use of likelihood ratios. Many forensic laboratories use the product rule or a modification of it that appears as Recommendation 4.1 in the Second National Research Council Report (NRC-II). The product rule requires an assumption of within- (Hardy–Weinberg or HW) and between- (linkage equilibrium or LE) locus independence, which cannot be exactly true. Departures from HW proportions can occur for various reasons, such as relatedness and population subdivision, and are expected in all real human populations. Recommendation 4.1 of the NRC-II is the most commonly used approach for dealing with these departures for single-source problems. However, this recommendation has not been applied to the mixed-stain problem, and it is worthwhile to extend the recommendation to such a problem. In this article, a general formula for calculating likelihood ratios for DNA mixtures based on Recommendation 4.1 of the NRC-II is presented. Two real cases are analyzed: One deals with a rape case in Hong Kong, whereas the other considers the case of People v. Simpson. Our formula also gives a proof to the important 2p formula (Budowle et al. 1991) for mixed stains obtained by Weir et al. (1997). Possible applications of the general formula to other situations such as Recommendation 4.4 are also discussed.
Introduction
DNA profiling is a very powerful and highly discriminating method for forensic human identification. It is customary to calculate the probability of a random match of the DNA profile of the crime scene evidence with that of the suspect. Under the assumption of the Hardy–Weinberg (HW) equilibrium, the match probability at a particular locus may be equated with the profile probability as
for homozygous alleles aa, | ||
2p_{a}p_{b} for heterozygous alleles ab, | (1) |
where p_{a} and p_{b} are the frequencies of alleles a and b respectively.
For a large variety of crimes, such as rape, the samples often contain material from more than one person. The evaluation of DNA mixtures for simple cases was discussed by Evett et al. (1991) under the HW equilibrium. However, the mixed-samples problem is complex (NRC-II 1996, p. 130). The authors of Weir et al. (1997) have provided a general formula. Brenner (1997) has provided the proof of the formula. Some recent discussions on mixtures can be referred to Brenner et al. (1996), Buckleton et al. (1997), and Clayton et al. (1998).
The work of Weir et al. (1997) and Evett et al. (1991) assumed the HW equilibrium. The validity of the HW assumption in various DNA databases has been investigated by many authors such as Weir (1992), Devlin and Risch (1993), and Fung (1996). Clearly, HW equilibrium is never exactly correct (NRC-II 1996, p. 98). To tackle this, four recommendations (Recommendations 4.1–4.4) have been suggested by the NRC-II.
Recommendation 4.1 is the most popular procedure for single-source samples, and many laboratories have adopted it because of its simplicity. In this article, a general formula for calculating match probabilities for mixed samples is derived under the population genetics model that lies behind Recommendation 4.1. How Recommendation 4.2 can be used to tackle the mixed-stain problem is discussed in Harbison and Buckleton (1998), Curran et al. (1999), and Fung and Hu (2000).
NRC-II Recommendation 4.1
Recommendation 4.1 denotes a very simple procedure, by which the match probabilities for single-source sample problems are evaluated as
Homozygote: | p_{a} Ú p_{a } = | ||
Heterozygote: | 2(p_{a} Ú p_{b}) = | 2p_{a} p_{b} , a ¹ b, | (2) |
where the quantity q is used to adjust for some types of departure from HW. The operator Ú is used to simplify expressions in the later discussion, and it carries different meanings for Homozygote aa and Heterozygote ab where a ¹ b. Simplification also occurs because there are no between-person correlations under the model implied in Recommendation 4.1, or at least these are not taken into account. Equation 2 is used for discrete genetic systems such as the PCR-based DNA markers in which exact genotypes can be determined. The HW rule is recovered when q = 0.
In the following, Recommendation 4.1 is extended to the mixed-stain problems.
Results
Let E denote the evidentiary DNA profile, V the victim profile, and S the profile or profiles of one or more suspects. In forensic science, it is common to assess the weight of evidence by means of the likelihood ratio (Aitken 1995) of two alternative explanations (hypotheses), H_{p} and H_{d} , for evidence E constructed as the ratio of probabilities given as
LR = | (3) | |
As in Weir et al. (1997), these probabilities can be expressed in the form of P_{x}áU½E ñ, where the set U denotes the alleles in E not carried by the known contributors, x denotes the number of unknown contributors to the profile, and these people cannot carry any alleles not found in E.
Usually in evidence from mixed stains, the population to which the victim belongs is known, and the victim is one of the contributors to the evidence. The number of contributors is often to be known under the circumstances of the crime or can be inferred by reviewing the data at all loci in a profile (FBI DNA Advisory Board [DAB] 2000). In some instances, the actual number of contributors to a mixture may be unknown. We can set a higher x and give the benefit of doubt to the suspect. The number of unknown contributors x is greater than one in a wide variety of crimes; for example, multiple rape cases.
As an illustration for evaluating P_{x}áU½E ñ, we take the evidence profile E = {abcd }, the victim profile V = {ab }, and the suspect profile S = {cd }. The two alternative explanations are
H_{p} : Contributors are the victim and the suspect and
H_{d} : Contributors are the victim and an unknown.
Under H_{p} , there are no unknowns (x = 0), and U = f , so PráE½H_{p} ñ = P_{x}áU½E ñ = 1.
Under H_{d} , the number of unknown x = 1 and the set of alleles in E not carried by the known contributor is U = {cd }, so PráE½H_{d} ñ = P_{x}áU½E ñ = P_{1}ácd½abcd ñ, which is equal to 2(p_{c} Ú p_{d} ).
If the number of unknowns in the explanation H_{d} is two instead of one, the case will become much more complicated. We have to evaluate the probability P_{2 } ácd½abcd ñ, that x = 2 unknown persons who must carry the set of distinct genotypes U in E. Under this circumstance, we need to find all combinations x_{1}y_{1}x_{2}y_{2} that satisfy U = {cd } d {x_{1}y_{1}x_{2}y_{2} } d {abcd } = E , where x_{1}y_{1} denotes the genotype of the first unknown and x_{2}y_{2} denotes that of the second one. This applies to the situation when major or minor contributors to the mixtures are indistinguishable (SWGDAM 2000). See the concluding section of this article for further discussion on signal intensities. For the example of Case 1 in Table 1, when the genotypes of the two unknown persons are ad, there are 12 distinct combinations of (x_{1}y_{1}, x_{2 } y_{2 } ), (aa, cd), (cd, aa), (aa, dc), (dc, aa), (ac, ad), (ad, ac), (ca, ad), (ad, ca), (ac, da), (da, ac), (ca, da) and (da, ca) with a total probability 4(p_{a} Ú p_{a} )(p_{c} Ú p_{d} ) + 8(p_{a} Ú p_{c } )(p_{a} Ú p_{d} ). The other possible genotypes and their corresponding probabilities are listed in Table 1. P_{2 } ácd½abcd ñ is obtained by summing all the probabilities in the last column of Table 1, which is very lengthy. Some form of simplification for P_{2 } ácd½abcd ñ in particular, and for P_{x}áU½E ñ in general, are essential. For brevity we introduce the following notation:
Definition: For any given n $ 1 and real a_{1}, a_{2}, þ, a_{n} , define
2a_{1}, a_{2}, þ, a_{n}2 = |
where operator Ú is defined earlier in Equation 2. Let
and so on, where i, j, …, 0 E, then we have the following theorem for calculating the match probabilities for mixed-stains problems based on Recommendation 4.1.
Theorem
P_{x}áU½E ñ = | (4) |
Note that the cardinality * U * cannot be greater than 2x, otherwise P_{x}áU½E ñ = 0.
The proof of the theorem is given in the Appendix. For the example given earlier in the theorem, we obtain that P_{2 } ácd½abcd ñ is equal to
which gives a much simpler expression than the sum of all the probabilities in the last column of Table 1. Under the HW rule, the match probability is much simpler, and it is expressed as
(p_{a} + p_{b} + p_{c} + p_{d} )^{4} – (p_{a} + p_{b} + p_{c} )^{4} – (p_{a} + p_{b} + p_{d} )^{4} + (p_{a} + p_{b} )^{4}.
As a special case, consider q = 0 in Equation 2, and
and so forth. We have
… |
Thus
which is the fundamental Equation 3 of Weir et al. (1997) under HW.
The computation of the right side of Equation 4 is easy to program. An executable program is available from the authors to perform the computation of P_{x}áU½E ñ when * U * £ 10 This is also used in the following examples.
Applications
A Rape Case
The effects of using the recommendation at different q values are studied by considering an example of a rape case from Hong Kong. The PCR–STR system (Profiler) was employed, and the results of the first three loci were selected, because it happened by chance that the combinations of victim and suspect genotypes were both heterozygous, both homozygous, and one heterozygous and one homozygous respectively, thereby giving a range of examples. The details are listed in Table 2. We illustrate the effects of the recommendation by taking q = 0.01 or 0.03 as suggested by the NRC-II report. Two sets of hypotheses are considered:
The first set of hypotheses (S1) gives two explanations to the evidence:
H_{p}: Contributors were the victim and the suspect,
H_{d} : Contributors were the victim and an unknown.
Table 3 gives the likelihood ratios at various q values. We notice that q has no effect on the likelihood ratio at D3S1358 in which the victim and the suspect are both heterozygous as expected from the form of the function (Equation 2). The effect of q can be studied at the other two loci in which either the victim, the suspect, or both are homozygous: The effect is not large. If we consider the combined effect for the three loci, the likelihood ratio drops by about 10 percent when q = 0.03 is taken.
If the evidence was collected from somewhere other than the victim’s body, another set of explanations (S2) should be used, which are
H_{p}: Contributors were the victim and the suspect, and
H_{d} : Contributors were both unknown.
The likelihood ratios at various q values are also given in Table 3. Again, the value q only has a small effect at individual loci, but the overall likelihood ratio drops by about 20 percent when q = 0.03 is taken.
O. J. Simpson Case
In the well-known case of People v. Simpson (Los Angeles County Case BA 097211), a three-band profile abc at D2S44 was obtained for DNA recovered from the center console of an automobile owned by the defendant. The profiles of the defendant, Mr. Simpson (OS), and a victim, Mr. Goldman (RG), were found to be ab and ac. We take the allele frequencies p_{a} = 0.0316, p_{b } = 0.0842 and p_{c} = 0.0926 as given in Weir et al. (1997).
In this instance, the court ordered that the number of contributors (n) to the evidence DNA mixture be set to two, three, or four. The two possible explanations are
H_{p}: The known contributors have all three alleles abc, or
H_{d} : All contributors are unknown.
Suppose that Recommendation 4.1 of NRC-II is taken regarding the single-banded alleles as true homozygote, and the effect of q in Equation 2 is investigated. The likelihood ratios under the two explanations with various numbers of unknowns, m and n under H_{p} and H_{d} respectively, are given in Table 4. In general, the value of q has a small-to-moderate effect on the calculation of likelihood ratios. When the more reasonable scenario n = 2, m = 0 is considered, the likelihood ratio drops by 12 percent (from 1623 to 1431) when q = 0.03 is taken.
In such a VNTR example with a possible contribution of a single-banded profile for explaining the mixed sample, the 2p rule is often used (Budowle et al. 1991). Weir et al. (1997) obtained a formula for this purpose, but no proof was given. Actually, if we define the frequency for p_{a} Ú p_{a} in Equation 2 as 2p_{a}, the general Equation 4 of the theorem is the same as that in Weir et al. (1997), thus giving an indirect proof to Weir’s formula. The likelihood ratios under the 2p rule are also presented in Table 4, and it can be seen here that the rule is often very conservative, as one might expect. Indeed, the likelihood ratio drops dramatically under the rule except for the cases n = 2, m = 1 and n = 2, m = 2, in which the likelihood ratio rises from 3.06 to 31. The likelihood ratios are all below 100 except one. In the more reasonable scenario n = 2, m = 0, the likelihood ratio is less than one tenth of that under the HW rule.
Another conservative approach is to define the frequency for p_{a} Ú p_{a} in Equation 2 as
where p_{n} is the null allele frequency. The resulting formulation can be investigated for a minimum LR in the realistic range for p_{n}.
Conclusion and Discussion
For a large variety of crimes such as rape, it is common to find samples containing mixed DNA stains. Although the NRC-II Recommendation 4.1 is commonly used in practice to adjust for departures from HW proportions, the extension of the recommended formula to the mixed-stains problem is lacking. This article derives a general formula for evaluating the likelihood ratio to handle the situation, and the usefulness of the formula is illustrated with examples.
Besides the likelihood ratio method proposed here, the probability of exclusion (PE) can also be used for mixture analysis. The PE provides an estimate of the portion of the population that has a genotype comprised of at least one allele not observed in the mixed profile (DAB 2000). However, the PE does not use all of the available genetic data.
The formula in Equation 4 not only applies to Recommendation 4.1, but also to the 2p rule when the p_{a} Ú p_{a} term in Equation 2 is changed to 2p_{a} or when p_{a} Ú p_{a} is changed to
This gives an indirect proof to the 2p formula of Weir et al. (1997). Furthermore, if one prefers to use different formulas for p_{a} Ú p_{a} and 2(p_{a} Ú p_{b } ) in Equation 2 such as Equations 4.8 or 4.9 for relatives in Recommendation 4.4 of the NRC-II report, or other formulas, one can easily do so by substituting these different formulas in Equation 2. If the formulas for relatives are used in Equation 2, our result in Equation 4 can only be used when the same kinship coefficient is used across the victim, the suspect, and the perpetrator. A general formulation for dealing with mixtures involving blood relatives and unrelated persons (i.e., combining Recommendations 4.1 and 4.4) is under investigation.
This article considers mixture problems with indistinguishable contributors. In some cases, when one of the contributors is known, the genetic profile of the unknown contributor may be inferred (SWGDAM 2000). The peak area or height can also be used to enhance or improve interpretation by distinguishing the major and minor contributors; see Clayton et al. (1998) and SWGDAM (2000). It is true that the possible number of contributing genotypes would then be reduced. However, our method considers all possible contributing genotypes, and so it is conservative and gives the benefits of doubt to the defendant.
Appendix: Proof of the Theorem
We first introduce a probability model, which is designed to elaborate Equation 4. Consider x boxes, and each box contains 2m balls labeled from 1, 1, 2, 2, . . . , m, m. Let p_{1}, p_{2}, . . ., p_{m} be m positive numbers satisfying
The probability that two balls labeled (i, j) from each box are drawn at random is p_{i} Ú p_{j}. Now draw 2 balls from each box independently, and get 2x balls in total. Let G be the labels of the gained balls. For any given subset E of set {1, 2, . . . , m } and any given subset U of set E, we want to compute the probability P (U Ì G Ì E ). Let A_{i} be the event that the ball labeled i is not drawn from any box, 1 £ i £ m. We have the following equations:
and so on, where i, j, . . . Î E. Using these equations and by the principle of inclusion and exclusion (Hall 1967), we have
P (U Ì G Ì E ) = | |
= | |
= |
We now turn to the computation of the probability P_{x}áU½E ñ of the DNA mixture model based on the ball/box model introduced previously. Let G be the alleles of x unknowns, then G have 2x alleles (not necessarily distinct). If we use the probability model provided previously and refer to the meaning of P_{x}áU½E ñ , we have P_{x}áU½E ñ = P (U Ì G Ì E ). Thus, the theorem is proved.
Acknowledgment. The first author is partly supported by the Hong Kong RGC Competitive Earmarked Research Grant HKU 7136/97H.
References
Aitken, C. G. G. Statistics and the Evaluation of Evidence for Forensic Scientists. John Wiley, New York, 1995.
Brenner, C. H. Proof of a mixed stain formula of Weir, Journal of Forensic Sciences (1997) 42:221–222.
Brenner, C. H., Fimmers, R., and Bour, M. P. Likelihood ratios for mixed stains when the number of donors cannot be agreed, International Journal of Legal Medicine (1996) 109:218–219.
Buckleton, J. S., Evett, I. W., and Weir, B. S. Setting bounds for the likelihood ratio when multiple hypotheses are postulated, Science & Justice (1997) 37:23–26.
Budowle, B., Giusti, A. M., Waye, J. S., Baechtel, F. S., Fourney, R. M., Adams, D. E., Presley, L. A., Deadman, H. A., and Monson, K. L. Fixed-bin analysis for statistical evaluation of continuous distributions of allelic data from VNTR loci for use in forensic comparisons, American Journal of Human Genetics (1991) 48:841–855.
Clayton, T. M., Whitaker, J. P., Sparkes, R., and Gill, P. Analysis and interpretation of mixed forensic stains using DNA STR profiling, Forensic Science International (1998) 91:55–70.
Curran, J. M., Triggs, C. M., Buckleton, J. S., and Weir, B. S. Interpreting DNA mixtures in structured populations, Journal of Forensic Sciences (1999) 44:987–995.
Devlin, B. and Risch, N. Physical properties of VNTR data, and their impact on a test of allelic independence, American Journal of Human Genetics (1993) 53:324–329.
DNA Advisory Board. Statistical and population genetics issues affecting the evaluation of the frequency of occurrence of DNA profiles calculated from pertinent population database(s) (approved February 23, 2000), Forensic Science Communications (July 2000). Available at: http:www.fbi.gov/programs/lab/fsc/backissu/july2000/dnastat.htm
Evett, I. W., Buffery, C., Willott, G., and Stoney, D. A guide to interpreting single locus profiles of DNA mixtures in forensic case, Journal of the Forensic Science Society (1991) 31:41–47.
Fung, W. K. Tests on independence of VNTR alleles in the Chinese population in Hong Kong. In: Modelling and Prediction. Eds. J. C. Lee, W. O. Johnson, and A. Zellner. Springer-Verlag, New York, 1996, pp. 294–304.
Fung, W. K. and Hu, Y. Q. Interpreting forensic DNA mixtures: Allowing for uncertainty in population substructure and dependence, Journal of the Royal Statistical Society A (2000) 161:241–254.
Hall, M. Combinatorial Theory. John Wiley, New York, 1967.
Harbison, S. A. and Buckleton, J. S. Applications and extensions of subpopulation theory: A caseworkers guide, Science & Justice (1998) 38:249–254.
National Research Council (NRC-II). The Evaluation of Forensic DNA Evidence. National Academy Press, Washington DC, 1996.
Scientific Working Group on DNA Analysis Methods (SWGDAM). Short tandem repeat (STR) interpretation guidelines, Forensic Science Communications (July 2000). Available at: http:www.fbi.gov/programs/lab/fsc/backissu/july2000/strig.htm
Weir, B. S. Independence of VNTR alleles defined as fixed bins, Genetics (1992) 130:873–887.
Weir, B. S., Triggs, C. M., Starling, L., Stowell, L. I., Walsh, K. A. J., and Buckleton, J. Interpreting DNA mixtures, Journal of Forensic Sciences (1997) 42:213–222.