Statistics and Trace Evidence: The Tyranny of Numbers by Houck (Forensic Science Communications, October 1999)

October 1999 - Volume 1 - Number 3

Statistics and Trace Evidence: The Tyranny of Numbers

Max M. Houck

Supervisory Physical Scientist
Trace Evidence Unit
Federal Bureau of Investigation
Washington, DC

Read about …

Abstract

DNA Analysis = Forensic Science

How Do We Know All Ravens Are Black?

Ubiquity Versus Uniqueness

“It isn’t that they can’t see the solution. It is that they can’t see the problem.” —G. K. Chesterton

Experience Counts

References

Abstract

The public perception of science allies it closely with mathematics, and the application of statistics to forensic DNA analysis has reinforced this perception. Numbers, however, are not required for the scientific process. All science, including forensic science, is a method of understanding the world around us, and quantitation is only one tool to assist that methodology. Yet, the public and the courts expect forensic scientists, including trace evidence examiners, to use mathematics and statistics regularly, based largely on the DNA model. Recent articles and court rulings have even suggested that without statistics, trace evidence may not be acceptably scientific.

This expectation is fraught with pitfalls that could adversely affect the accuracy of evidentiary reports presented in court. The foundational data upon which trace evidence statistics might be based differ radically from those used in DNA statistical calculations. If statistics are to be applied to trace evidence, they must be applied in a way appropriate to the discipline, unbiased in interpretation, and accessible to the trier of fact.

DNA Analysis = Forensic Science

You might not expect an article on trace evidence to begin by discussing DNA, but the advent of forensic DNA analysis has produced significant changes in the perception, both public and professional, of forensic science. This is particularly true of trace evidence, where numerous attempts at statistical evaluation or data gathering have been published (Home and Dudley 1980; Biermann and Grieve 1996a, 1996b; Biermann and Grieve 1998; Curran et al. 1998). No one model has been widely adopted, particularly in the United States, and yet legal experts, attorneys, and the courts are increasingly interested in using statistical methods to increase the reliability of trace evidence. The comments most often heard by trace analysts, “Why can’t you calculate a number like DNA?”, and “Trace evidence is only ‘could have’ evidence,” adequately frame the dilemma we face. We cannot provide the same statistical frequencies as our DNA colleagues, but we have observed that a positive association of paint, hair, or fibers is a significant event that is not likely to be duplicated at random. So, then, if trace evidence analysts know this, why can’t they use statistics to help convey this information to the jury?

Back to the top

How Do We Know All Ravens Are Black?

Science has the ancient philosophical traditions of Greece, Rome, and the Middle East as its basis. The Greek philosophers, beginning with the mathematician Thales (ca. 600 B.C.) and including Aristotle (384–322 B.C.), were primarily rationalists, proposing the solutions to scientific questions by focused reasoning, what we now call deduction. Thales in particular was adamant in accepting only results that had been established by mathematical reasoning. Because all mathematical proofs are by their nature deductive (Kline 1967), this, along with other social factors in ancient Greek civilization, led to an inescapable reliance on deductive logic. The success of these ancient scholars in their explanations led to “an overrating of a purely rational approach” (Mayr 1982), reaching its pinnacle with the French philosopher and mathematician René Descartes (1596–1650). Like Thales, Descartes’ ideal of scientific reasoning was a mathematical proof. This perception persists even today, particularly in the popular concept of science. It has persisted not only in the physical sciences, where mathematical proofs are often possible, but also in the biological sciences (Mayr 1982). The “tyranny of numbers,” the trenchant belief that science is best expressed through mathematics, overshadows the potential explanatory power many disciplines have, simply because a mathematical value is expected but may not be possible. The Scottish historian David Hume (1711–1776) noted, for example, that in many, if not the majority of cases, it is impossible for biologists to provide proofs of pure mathematical certainty because of the complex nature of living systems (Hemple 1966).

Deduction is but one method of reasoning. Induction, generalizing about the world from observations, is another. Common sense and much of what we know about the world comes from our inductionist approach to life and learning. As one author has described it, “You see a raven. It’s black. You see other ravens, and they’re black too. Never do you see a raven that isn’t black. It is inductive reasoning to conclude that all ravens are black” (Poundstone 1988, p.14). Induction is reasoning from what we might call circumstantial evidence and is common in our daily discussions, reasoning, and litigation. Whereas induction expands our knowledge of the world through accumulated experience, it cannot provide absolutely unquestionable conclusions (Kline 1967). Plato, in particular, rejected induction for just this reason: Because it relied on the senses for its content, and our senses are subject to bias, induction was inherently inferior to pure deductive logic. An example from the field of mathematics might help to clarify this distinction.

One of many unsolved problems in mathematics is the conjecture that every even number is the sum of two prime numbers. This is called Goldbach’s Hypothesis, after the eighteenth century mathematician Christian Goldbach. If we were to test this hypothesis, we would find that it was true no matter how large the number (4 = 2 + 2; 6 = 3 + 3; 8 = 3 + 5; 10 = 3 + 7). By inductive reasoning, one could conclude that every even number is in reality the sum of two prime numbers. This would not be suitable to a mathematician, however, who would demand a deductive (that is, mathematical) proof, even if it takes hundreds of years to produce (as was true of Fermat’s Last Theorem). A scientist, by contrast, would not think twice about using this inductively well-supported proposition.

Which is more appropriate to forensic trace evidence comparison, deduction or induction? Although useful to the investigation, deduction alone is insufficient (or even impossible), just as facts alone are not explanatory. Testifying that “I found carpet fibers on the victim’s clothing that matched carpet fibers from the defendant’s car” is, to many experts, insufficient. The defendant’s car may have been the best-selling make and model that year, placing hundreds or thousands of them on the road. The victim may have owned the same car or a similar model with the same carpeting. Or the car may have been made in 1972, and the carpeting in it has not been made in 27 years. Facts taken out of context offer little help to the trier of fact.

Induction by itself is also not useful. You may have made the same observations for years, but you have not tested even one of them. How do you know that what you have been observing is true? Your observations may be influenced by your method, your environment, and your personal or unconscious bias. Your data must be available for others to repeat and test independently. Science is, among other things, public knowledge. By making our observations known to others, we provide the methods, materials, data, and conclusions that allow others to repeat what we have done and possibly get the same results. Publishing, in fact, has been suggested as the best way of removing the junk from science in the courts (Huber 1991).

Induction, therefore, provides the platform, the “real-world” units (“ravens”) and generalizations (“all of them are black”) that allow scientists to identify and construct deductive, testable arguments. In essence, deduction and induction constitute the scientific method applicable to forensic science. This is phrased more completely by Mayr (1982) when he speaks about biology:

Biological . . . theory supplies the substantive meaning, the units of observation, the means of asking and answering questions, and the means to bring the entire process under the control of the investigator so that it can be examined, challenged, and evaluated. . . . Microscopical observation requires units and the means of linking observations made in those units. Simplistically rendered, biological science recognizes the necessity of these decisions and their profound influence on the nature of conclusions, and brings them under control by explicit formulation and procedures. . . . Because of the constraints placed by theory on asking questions, making observations, and assembling linking data, empirical conclusions are linked into internally coherent bodies of reliable and useful evidence.

The tyranny of numbers is a consequence of an overreliance on deduction and mathematics, and these ultimately limit a discipline by requiring it to fit into a preordained model. Equating quantification with science to justify and validate its “science-ness” indicates that a “faulty notion of science, or no notion at all,” is at the heart of the tyranny (Mayr 1982). Equally, some information does not lend itself to a mathematical approach. Pierre Bayle (1647–1706) asserted, for instance, that historical certainty was different from, but not inferior to, mathematical certainty. The existence of dinosaurs, Paleolithic tools, and the Roman Empire are as certain as anything in mathematics but cannot be rendered into an algorithm. Comte de Buffon (1707–1788), the French naturalist, in particular, asserted emphatically that some subjects are far too complicated for a useful employment of mathematics.

The popular assumption is that applying mathematics, typically in the form of statistics, to trace evidence results will provide more certain answers. But before the decision is made as to which method to use, it must be asked: Is it even plausible?

Ubiquity Versus Uniqueness

Any statistical interpretation of a trace evidence finding would have to be based on a thorough knowledge of the population under study, necessarily through sampling. This is the primary problem with trace evidence: The size of the population. All trace evidence is, in one form or another, mass produced. Odd as it may sound, this even includes hairs (the average human head contains 100,000 hairs growing at a rate of 0.5 inch per month in a 24–36 month cycle) and natural fibers, such as cotton (approximately 40 billion pounds of cotton are produced each year) and soil. Other types of trace evidence are manufactured by humans, such as synthetic fibers, paint, and glass.

Despite this mass production, all things are unique in time, space, or both. When the valet brings my 1996 Volkswagen Jetta around, I can instantly recognize it as my car out of the thousands that were made. Accordingly, that set of objects which include my particular Jetta has a membership of one. The ability to individualize, that is, reduce the membership of a set to one unique object, rarely happens in trace evidence. Most often, it occurs when one item is damaged and is separated into two or more pieces. The rest of the time, trace analysts must deal with the ubiquity of similar items grouped into significantly larger sets, like green 1996 Jettas. All members of that set may be indistinguishable, or certain subsets may be discernable, depending on the patterned variance within the set. Another dilemma may be that although the human genome is essentially stable during any particular statistical sampling period, measurable trace material populations may change at least as fast, if not sometimes faster, than they can reasonably be sampled. By the time the trace statistic is established, its foundational assumptions may no longer be valid.

How these items can be sorted is important. Fundamentally different items can be easily distinguished (this cotton fiber is not that nylon fiber), whereas similar items may present more difficulty (this cotton fiber is not that cotton fiber). Given homogeneous, representative known samples—one of nylon and one of cotton fibers—were the aforementioned cotton and nylon fibers jumbled together, it would be easy to distinguish their sources. Were the two cotton fibers jumbled, all things being equal, it would not be easy.

But if all things are unique, then why worry about what measurements to take? This highlights an important tenant of trace analysis: The number and type of discriminating tests that can be performed are determined by the nature of the evidence. The nylon fiber could be analyzed in many more ways than the cotton fiber, and this adds to the analyst’s potential to discriminate between any two nylon fibers. This is one of the reasons certain classes of evidence may not be very useful: One white cotton fiber looks pretty much like any other white cotton fiber no matter how you analyze it. What we can say about a material is intimately bound up in what it is and how it was made. The philosophical basis for our interpretation of the material is evident in the methods we employ to analyze it. Lewis (1929, p.195) makes it plain that “[I]n fact, experience can not even be recorded unless there is some theory, however crude, that leads to a hypothesis and a system by which to catalog observations.” You cannot observe or describe things without at least some idea, implicit or explicit, as to why you are doing what you are doing. This philosophical infrastructure has been stated more succinctly as “the concept is synonymous with the corresponding set of operations” (Bridgman 1928, p. 5).

The very fact that a fiber examiner notes the cross-sectional shape of a fiber means that implicit in that observation is the underlying premise, “not all manufactured fibers have the same cross section.” This measurement defines a class of objects in a meaningful way for a trace analyst (trilobal versus dogbone, for example). It also provides a parameter by which a population of objects can be searched. This does not mean, however, that such a population can be easily defined, and this becomes obvious when you turn the question around. If, instead of asking what cross section a single fiber has, you ask, “How many fibers made in 1998 have a trilobal cross section?”, you have framed an entirely different and much more complex problem.

The textile industry measures fiber production in pounds. More than 58,383,000 pounds of noncellulosic manufactured fibers were made in 1997 (Fiber Organon 1998). No information is available as to how much of that fiber product was trilobal. Besides, even if the information was available, it might not tell us what products those fibers went into, where those products were distributed, and how many were sold, among other useful information. If manufacturing data cannot help because of its inaccessibility, then we must turn to product information. And this only compounds the problem.

The Federal Trade Commission lists 17 generic classes of manufactured fibers (FTC, 16 CFR 303.7). Over 1,770 fiber manufacturers worldwide are listed for 1998 (World Directory of Manufactured Fiber Producers 1999). The number of cross-sectional shapes for manufactured fibers exceeds 500 different types, not counting copycats and patent infringements (Knoop 1999). More than 8,000 dyes and pigments are used in the coloring of textile fibers (Textile Chemist and Colorist, July 1998), and variations in the processing and production of colored textiles yield an almost infinite number of potential colors (Aspland 1981; Connelly 1997). These materials are assembled into finished textiles by approximately 113,000 manufacturers (RN and WPL Encyclopedia 1999). This web of production makes it difficult, if not occasionally impractical, to trace any one product and identify its components to their sources.

The other aspect of this diversity is very close to each of us, as close as our own closets, homes, vehicles, and workplaces. The range of textiles we encounter as we move through our daily environments is staggering. It is increased and made more complex by the fact that we encounter and interact with others moving through their environments, providing opportunities for the transfer of materials (Locard 1930a, 1930b, 1930c). The number of different fiber types found on any one textile, such as clothing, therefore, is potentially very large, making it impossible to track each type to its source or sources. Although this discussion has focused on fibers, the intractability argument could be made for nearly any type of trace evidence. Yet, in certain investigations, trace evidence has been researched and sourced with significant probative effect (Deadman 1984a, 1984b; Deedrick 1998). Why do trace analysts not do this in every case?

“It isn’t that they can’t see the solution. It is that they can’t see the problem.” —G. K. Chesterton

How does one determine the frequency of a fiber type? You could look for a specific fiber on unrelated garments (Houck and Siegel 1999) to gain an idea of its commonness or rarity. It could also be possible to determine the frequency of fiber types within a textile population, either by cataloging the fibers (Home and Dudley 1980) or the textile population (Biermann and Grieve 1996a, 1996b, 1998) or by sampling selected environments to determine chance matches (Palmer and Chinherende 1996). Small ad hoc studies could also be performed to answer specific questions of frequency (Deedrick 1998).

The difficulty with all of these approaches is that they are not universal in their application, if they can be applied to casework at all. Questions of sampling, randomness, and relationships impinge on this process and complicate matters. If it is rare to find Fiber Type X on movie theater seats, what is its frequency in shopping malls or living rooms? Is Fiber Type X extremely rare in general or only in certain environments? Against what or whom do we judge rarity, that is, what is “at random”? (Buckleton 1991). Before trace analysts can comfortably approach using statistics to present their results to a judge or jury, it is critical that these fundamental questions be answered and those answers be accepted by the trace evidence community. Note that the basic form of trace evidence data is different than that of DNA and must, therefore, be treated with different statistical approaches in mind: One size does not fit all.

The approaches that would seem to have the greatest applicability would be those that are local or global in their assessment of trace evidence environments. An example of local assessment might be a catalog of known clothing, à la Biermann and Grieve, only derived from a large number of crime scenes. An analyst could then search the catalog or database for the particular arrangement of materials from other crime scenes and then compare the frequency of these fiber types to the case at hand. This would allow for an apples-to-apples comparison between similar environments (crime scenes) rather than between dissimilar ones (crime scenes versus mail-order catalogs). Census data could be incorporated to determine similar socioeconomic regions to account for the average income, and therefore cost of goods, for a particular scene (Lubove 1999). The data are certainly out there, sitting in the property listings in crime laboratories across the nation, waiting to be mined.

A global approach would, of necessity, be more difficult as it would have to provide a universal theory of trace evidence presence, transfer, and persistence. Standardization of interpretation for transfer and persistence rates would be required, as well as developing information on background levels of trace evidence. A unified trace theory is both the holy grail of trace evidence and, possibly, a pipe dream. Possible avenues of research exist, such as analogies to complex systems (Kareiva 1990) or cumulative sifting of existing information into a larger answer (Hunter and Schmidt 1990), and although work is proceeding on the viability of these venues, accepted statistical applications are not expected in the courtroom soon.

Experience Counts

Lacking (for the moment) the statistical machinations of peers in the forensic science community who analyze DNA, trace analysts must detail the significance of their findings with the only tool currently available to them: jury education. Properly done, jury education can convey to laypersons the principles, practices, and underlying logic of the examinations we perform. If they understand how the particular manufacturing process in question works (i.e., paint, fibers, glass) and why we choose the protocols and methods we use, then our analyses and results will ultimately make sense within that context. Context is, in fact, the crucial component to a proper grasp of the significance of trace evidence. Without context, we are communicating mere facts with no foundation of meaning, much in the way Poincare’s pile of stones is not a house. To this end, trace evidence can rarely tell us who definitively, but often it can answer something much more helpful: How. Knowing, for example, that the red carpet fibers found in the stolen white vehicle are consistent with the carpeting in the subject’s car and that the white paint from the stolen vehicle is consistent with the paint found on the clothing of the hit-and-run victim whose clothing produced a fabric imprint in the stolen vehicle’s hood provides a clear series of events that can be checked against witness descriptions. Cross-transfers (also called two-way transfers) additionally cement the relationship between two or more people, places, or things. This accumulation of independent transfers of paint, glass, hairs, and fibers makes intuitive sense to a jury, just as it does to trace analysts. That is the inductive part: We, as bench-level scientists, provide the deductive part through our methods, testing, and comparisons.

Each case worked offers the possibility of new experiences and novel observations. These become stored in our memories through learning, trial and error, and experience, ultimately building what has been referred to as a visual dictionary (Houck 1995). The application of statistics in trace evidence is at a nascent stage of development, inadequate but full of potential. If statistics are to be successfully applied to trace evidence, they must be appropriately applied, unbiased in interpretation, and intelligible to the trier of fact. If each trace examiner followed the same protocol while examining the same evidence, they would ostensibly come to the same conclusion. But, for now, each examiner’s experience is the key to interpreting those conclusions within the context of the crime scenario. And it is up to each examiner to relate this understanding to the trier of fact so that they may in turn comprehend its meaning and relevance.

References

Aspland, J. R. What Are Dyes? What Is Dyeing? AATCC Dyeing Primer. American Association of Textile Chemists and Colorists, Research Triangle Park, North Carolina, 1981.

Biermann, T. W. and Grieve, M. C. A computerized data base of mail order garments: A contribution toward estimating the frequency of fibre types found in clothing. Part 1: The system and its operation, Forensic Science International (1996) 77:65-73.

Biermann, T. W. and Grieve, M. C. A computerized data base of mail order garments: A contribution toward estimating the frequency of fibre types found in clothing. Part 2: The content of the data bank and its statistical evaluation, Forensic Science International (1996) 77:75-91

Biermann, T. W. and Grieve, M. C. A computerized data base of mail order garments: A contribution toward estimating the frequency of fibre types found in clothing. Part 3: The content of the data bank: Is it representative?, Science and Justice (1998) 95:117-131.

Bridgeman, P. W. The Logic of Modern Physics. McMillan, New York, 1928.

Buckelton, J. S. and Walsh, K. A. J. Who is “random man”?, Journal of the Forensic Science Society (1991) 31:463-468.

Connelly, R. L. Colorant formation for the textile industry. In: Color Technology in the Textile Industry. American Association of Textile Chemists and Colorists, Research Triangle Park, North Carolina, 1997, pp. 91-96.

Deadman, H. A. Fiber evidence and the Wayne Williams trial: Part I, FBI Law Enforcement Bulletin (1984a) 53(3):12-20.

Deadman, H. A. Fiber evidence and the Wayne Williams trial: Conclusion, FBI Law Enforcement Bulletin (1984b) 53(5):10-19.

Deedrick, D. W. Searching for the source: Car carpet fibres in the O.J. Simpson case, Contact (1998) 26:14-16.

Federal Trade Commission Rules and Regulations under the Textile Products Identification Act, Title 15, U.S. Code section 70, et seq. 16 CFR 303.7.

Fiber Organon. Fiber Economics Bureau, Washington, DC, 1998.

Hempel, C. G. The Philosophy of Natural Science. Prentice-Hall, Englewood Cliffs, New Jersey, 1966.

Home, J. M. and Dudley, R. J. A summary of data obtained from a collection of fibres from casework materials, Journal of the Forensic Science Society (1980) 20:253-261.

Houck, M. M. and Siegel, J. A Large Scale Fiber Transfer Study. Presented at the American Academy of Forensic Sciences, Orlando, Florida, 1999.

Houck, M. M. The Limits of Computing in Forensic Science. Presented at the American Academy of Forensic Sciences, Seattle, Washington, 1995.

Huber, P. Galileo’s Revenge. Harper Collins, New York, 1991.

Hunter, J. E. and Schmidt, F. L. Methods of Meta-Analysis. Sage, Newberry Park, California. 1990.

Kareiva, P. Population dynamics in spatially complex environments: Theory and data, Philosophical Transactions of the Royal Society of London (1990) 30:175-190.

Kline, M. Mathematics for the Nonmathematician. Dover, Mineola, New York, 1967.

Knoop, D. Allied Signal, personal communication, March 22, 1999.

Lewis, C. I. Mind and the World Order. Scribners, New York, 1929.

Locard, E. The analysis of dust traces. Part I, American Journal of Police Science (1930a) 1:276-298.

Locard, E. The analysis of dust traces. Part II, American Journal of Police Science (1930b) 1:401-418.

Locard, E. The analysis of dust traces. Part III, American Journal of Police Science (1930c) 1:496-514.

Lubove, S. Redlining software, Forbes (1999) 5:53-55

Mayr, E. The Growth of Biological Thought. Harvard Belknap, Cambridge, Massachusetts, 1982.

Palmer, R. and Chinherende, V. A target fiber study using cinema and car seats as recipient items, Journal of Forensic Sciences (1996) 41:802-803.

Poundstone, W. Labyrinths of Reason. Doubleday, New York, 1988.

RN and WPL Encyclopedia. Salesman’s Guide Press, Richmond, Virginia, 1999.

Special report: A buying guide to products and services for the textile wet processing industry, Textile Chemist and Colorist, July 1998.

Editor’s Note: This article was revised January 2000.

Sections