The Evidential Value of Differential Diagnosis

“Differential Diagnosis”—What Good Is It?

By Bill Masters

Wallace, Klor & Mann, P.C.

I. Introduction

This article is the second in a two part series about proof of “general causation.” Specifically, it concerns the evidential value of “differential diagnosis” to prove “general causation.”

Proving “causation” is a bedevilment. So it has been always in all disciplines. So it is now no less than in the discipline of law. There, plaintiffs, with that odious burden of proof, shudder when attempting to prove “general causation,” particularly in the unsurveyed fields of so-called “novel science.” When trailblazing this wilderness, they look to the courts for guidance, for expedients, to hew a path to recovery.

And lo, seek and they shall find help in the form of the expedient known as “differential diagnosis.” This expedient is used to prove not only what it’s traditionally designed to prove–“specific causation”–but also what it was never designed to prove–“general causation.” Is this comfortably expedient path, the correct path? The answer is, only rarely, if at all.

II. Differential Diagnosis Defined

“Differential diagnosis” is an iterative process (within the hypothetico-deductive model) with four steps designed to establish a particular patient’s clinical diagnosis—(1) listening to the patient describe symptoms and observing signs, (2) generating from those observations hypotheses about a process of disease or symptom formation, (3) gathering additional data to test these hypotheses, and (4) evaluating these hypotheses in light of these data. These four steps are repeated as many times as hypotheses are considered and then rejected, confirmed or set aside for further testing. D. L. Sackett et. al. Clinical Epidemiology, p. 17 (1991); H.C. Sox, et. al. Medical Decision Making, pps. 9-26 (1988); J.P. Kassirer. Diagnostic Reasoning. Annals of Internal Medicine, 110:893 (1989); R.L. Engle, Jr. & B.J. Davis. Medical Diagnosis: Past, Present and Future. Archives of Internal Medicine, 112: 512-543 (1963); F.J. McCartney. Diagnostic Logic. BMJ, 295: 1325-1331 (1987); J.P. Kassirer & R.I. Kopleman. Learning Clinical Reasoning, pps. 109-114 (1991).

Differential diagnosis is a form of “proof by exclusion.” In this form of proof, the clinician lists all possible diagnosis given the clinical facts; proves that the list is exhaustive; and then eliminates as unlikely all possible diagnoses but one. That is why if no positive grounds exist for a diagnosis, diagnosis by exclusion is hard to justify logically. E.A Murphy. The Logic of Medicine, p. 201 (1997).

In the process of a differential diagnosis, if the clinician cannot fit the patient’s clinical profile into an established disease category or syndrome, the clinician, as a matter of necessity or expediency, will speculate or guess what might be the disease process producing that profile. This scenario, where the physician speculates, represents the “case study.” The clinician could tell the patient, “I cannot determine what is causing your symptoms or signs; please seek help elsewhere.” But, eventually, whatever clinician has the gold standard specialty for diagnosing the presumed problem, that physician will likely be compelled to speculate.

III. An Overview of the Analytical Space

The process of differential diagnosis has been judicially sanctioned in Oregon for use to establish both specific and general causation. Jennings v Baxter Healthcare, 331 Or 285, 14 P3d 596 (2000); Marcum v. Adventist Health System, 345 Or 237, 193 P3d 1 (2008). From the perspective of science and its scientific method, this development is troublesome. The process of differential diagnosis has long been a generally accepted way to establish specific causation. In this process, the possible differentials are considered to be generally accepted general causative relationships, which the clinician hypothesizes potentially apply to explain this particular patient’s clinical profile.

When differential diagnosis is proffered to establish general causation, authorizing its use must be assessed in the context of what evidence of general causation already exists. When “generally accepted scientific theories” of general causation already exist, use of differential diagnosis to establish a different hypothesis about general causation is either unnecessary or inappropriate. When generally accepted science does not exist, then the hypothesis of general causation is novel, that is, the premise from which the expert testimony is deduced is not sufficiently established to have gained general acceptance in the scientific community. When that is the case, there may or may not be “generally accepted evidence” of general causation.

“Generally accepted scientific evidence” =_{df as} evidence that the scientific community as a whole accepts as having the power to prove, confirm or verify scientific hypotheses.

“Generally accepted scientific principles or theories” =_{df as} hypotheses that the scientific community has accepted, more or less, as being valid.

Even though the premise is insufficiently established to have gained general acceptance in the scientific community, the premise may be supported to some degree by generally accepted evidence of general causation in the form of data from experimental (in vitro and in vivo) and epidemiologic studies. That evidence should trump any evidence proffered in the form of a differential diagnosis.

When no generally accepted evidence of general causation exists in the form of data from experimental or epidemiologic studies, then the question arises, do any surrogates to data from such studies exist consistent with the legal rules for admissibility of evidence? That is, is a case study or are a case series adequate surrogates? This is the area of focus in this article: the potential use of case studies to establish general causation.

If no surrogates exist, then the party with the burden of proof is unable to carry that burden, and his or her case is, perhaps sadly, dismissed. That is, the law is not that if the available “evidence” is merely speculative that this available evidence thereby becomes from necessity admissible as probative. See UCJI No. 5.01 (“…you must not engage in guesswork or speculation.”)

IV. The Analytical Space of “Novel Science”

A. “Face Validity”

“Face validity” is defined as that state in which “the proffered evidence looks or appears to be what it is claimed to be to untrained observers.” B. Nevo. Face Validity Revisited. J. of Educational Measurement, 22: 287-293 (1985); S. P. Turner. The Concept of Face Validity. Quality & Quantity, 13: 85-90 (1979). Face validity is the key dilemma over the admissibility of proffered novel science. The problem is that which occurs when the jury is presented with evidence of a novel scientific principle which is supported by only face validity but which is, in fact, most probably invalid.

Face validity is a problem because the typical juror is scientifically illiterate. As a Gallup poll of 1236 adult Americans revealed, those who are apt to decide lawsuits believe in the following absurdities: Astrology 54%; ESP 45%; aliens have landed on earth 22%; dinosaurs and humans lived simultaneously 41%; communication with the dead 42%; and ghosts 35%. Gallup, G.H., Jr. & Newport, F. Belief in Paranormal Phenomena Among Adult Americans. Skeptical Inquirer, 15, no. 2: 137-147 (1991); Shermer, M. Why People Believe Weird Things, p. 26 (1997). Polls like this should leave little doubt that an attorney, with even modest rhetorical skills, can convince the typical juror that the moon is made of cream cheese or that Elvis can be seen walking the halls of Graceland.

B. The Strengthened Specified Relevancy Test

To guard the gullible jury against the allures of proffered evidence with only face validity, Oregon courts, in screening proffered scientific evidence, are expected to employ the “strengthened specified relevancy test” (“SSRT”). State v. Brown, 297 Or 404 (1984); State v O’Key, 321 Or 285 (1995). By the SSRT, the trial court must screen proffered evidence, however minimally, for validity beyond mere “face validity.” As the Oregon Supreme Court remarked in O’Key:

“Both decisions [Brown and Daubert] view the validity of a particular scientific theory or technique to be the key to admissibility. Both require trial courts to provide a screening function to determine whether the proffered scientific evidence is sufficiently valid to assist the trier of fact. Under both decisions, a trial court should exclude “bad science” in order to control the flow of confusing, misleading, erroneous, prejudicial, or useless information to the trier of fact.” State v O’Key, 321 Or 285 at 306 (1995).

Oregon adopted the “relevancy test” in State v. Brown, 297 Or 404 (1984). There, the Oregon Supreme Court held that for expert testimony to be admitted, the trial court must determine that the proffered evidence is “relevant” under OEC 401, “helpful” under OEC 702 (that is, it is within the expert’s field, the expert is qualified, and the foundation of the opinion intelligently relates the testimony to the facts) and that its “probative value” not be substantially outweighed by the threefold dangers of unfair prejudice, of confusion of the issues or of misleading the jury under OEC 403. (In Brown, the Oregon Supreme Court did not have the phrase “scientific knowledge” in OEC 702 carry any analytical water.)

“Relevant” =_{df as} the minimal degree of probative value needed to make the existence of any fact of consequence to the determination of the action more probable or less probable. OEC 401; State v Hampton, 317 Or 251, 255 (1993).

“Probative Value” =_{df as} the degree to which evidence makes the existence of any fact of consequence to the determination of the action more probable or less probable. State v O’Key, 321 Or 285, 299 n. 14 (1995).

This relevancy test is qualified as being “specified” because the Oregon Supreme Court required that the trial court, in applying the criteria of OEC 401, 702 and 403, consider a number of specific factors in assessing both (1) the “probative value” of the proffered evidence (the power of the proffered evidence to help the jury) and (2) its power to mislead the jury.

This “specified relevancy test” was “strengthened” in State v O’Key, 321 Or 285 (1995). There, following Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 US 587 (1993), the Oregon Supreme Court added the requirement under OEC 702 that the proffered evidence first fall within that set of beliefs characterized as known facts or truths accepted as such on good grounds (“T-GG”), and then fall within that subset of such facts or truths accepted on good grounds those truths characterized as “scientific” knowledge (“T-SM”)–defined as a knowledge derived by the “scientific method,” a time honored and tested method based on generating hypotheses and testing predictions deduced from them to determine whether or not the hypotheses can be falsified. (What the court directly evaluates is not so much the coherence of the belief with other beliefs, but whether it has “good grounds” and whether it was generated by a reliable method.)

T-SM ⊂ T-GG

In summary, proffered scientific evidence must pass through a screen with a gauge set by the following four criteria before being presented to a jury: (1) “Relevant” under OEC 401; (2) “helpful” under OEC 702; (3) “probative value outweighs propensity for unfair prejudice, confusion or misleading jury under OEC 403; and (4) “scientific knowledge” under OEC 702.

C. The SSRT and the Problem of Face Validity

This “strengthened specified relevancy test” is often interpreted rather cavalierly by trial judges to admit proffered evidence without much, if any, regard for its validity. Whether or not the evidence is valid, it is urged, is best determined by the jury.

This interpretation is suspect because it forces certain, unwanted corollary interpretations of OEC 401 and 403. For example, under OEC 401, proffered evidence that is invalid would still be “relevant” if on its face it tended to make the existence of any fact of consequence more probable or less probable. It would follow, given that relevant evidence has, by definition, some probative value, that invalid evidence may still have “probative value under OEC 403″ if it is probative on its face. That is, that the evidence have what is characterized as “face validity.” As discussed, face validity is not technically validity–it merely refers to whether or not the proffered evidence looks or appears to be what it is claimed to be to untrained observers (i.e., jurors).

But this preference for the superficial can only be taken so far. For OEC 403 purports to require the court to assess not only the probative value of proffered evidence but also its power to mislead the jury. Proffered evidence which is offered as valid and has “face validity” (validity that is not even skin deep), but which is invalid, would definitely mislead the jury. So if the probative value of proffered evidence is to be weighed against its power to mislead the jury, the trial court must assess its validity beyond its “face validity.” See State v Sampson, 167 Or App 489, 6 P3d 543, 551 (2000) (“… Oregon law also focuses on the overall effect that a technique’s aura of scientific certainty will have on the jury”).

V. The Analytical Space of Differential Diagnosis

A. What Differential Diagnosis Cannot Do

Differential diagnosis is a process of placing a patient’s clinical profile into an established disease category or syndrome from a set of possibly applicable established disease categories or syndromes. It is a process of instantiation, not of creation. That is, the “hypotheses” (of possibly applicable disease categories) generated in the process of differential diagnosis are not hypotheses about whether a general causative relationship exists between an exposure and an effect. They are rather hypotheses about whether certain generally accepted generally causative relationships (or disease categories or syndromes) apply to a particular individual in the clinic. The process of differential diagnosis simply does not and cannot transmute a novel hypothesis about a general causative relationship into a valid principle of general causation.

B. Establishing Disease or Syndrome Categories

Establishing a disease or syndrome category is never simple—certainly never as simple as a differential diagnosis. More often than not the purported harmful exposure allegedly produces a variety of “non-specific” signs and symptoms, and no patient presents with what could be considered a typical constellation of those signs and symptoms.

Symptom Clusters =_{df as} coincident assemblages of symptoms or signs that do not constitute a syndrome or disease.

Syndromes =_{df as} disorders without a single identified cause or pathogenesis. E.A Murphy. The Logic of Medicine, p. 132-134 (1997); E.A. Murphy. A Companion to Medical Statistics, pp. 157-161 (1985).

Diseases =_{df as} disorders with a single identified cause or pathogenesis. See Taber’s Cyclopedic Medical Dictionary.

Typically, at the outset of this process, scores of symptoms may be implicated, from abasia to zoster. But implicating so many symptoms proves too much. The possible combinations of signs and symptoms, if n represents the total number of signs and symptoms, is 2^n-1. (To back out the empty subset, subtract 1 so that the relevant possible combinations of signs and symptoms is 2^n-l – 1.) Given millions, billions, or even trillions of possible combinations, no expert would have any problem taking any patient–whether or not exposed to the putative toxin or pathogen–and finding some combination of those symptoms which could then be used to diagnose a disease if the patient coincidentally happened to have been exposed to the putative toxin or pathogen. Obviously, if the only thing that distinguishes those with the alleged disorder and those without it is being exposed, no effect has been identified in a cause and effect relationship; all that has been identified is a potential cause, and the issue of causation has been begged.

To identify a unique pathology, experts must establish, using multivariate statistical techniques, that this particular cluster of non-specific symptoms is not a random assortment or cluster of signs and symptoms. A group of signs and symptoms is said to be a “random cluster” or, better, a “random assortment,” when an individual has signs and symptoms from one or more disorders or conditions that have occurred together by virtue of chance or by virtue of what are called processes of “co-morbid disease.” For example, consider a middle aged woman (1) who has depression resulting in symptoms of fatigue, muscle aches, forgetting and difficulty concentrating, and (2) who is taking antidepressants that have the side effect of dry mouth, and (3) who is also perimenopausal, resulting in a variety of well-known symptoms such as joint aches, insomnia, tingling sensations in her extremities, acne rosacea, alopecia, dry skin, elevated blood cholesterol, hot flashes, night sweats, headaches, and fatigue. If she reported this constellation of symptoms to her doctor, the doctor, if at all experienced, would not automatically consider them to be manifestations of a “new atypical” disease, but rather to be a random cluster of symptoms from depression, antidepressant medications, and low levels of estrogen.

After the purported diagnostic criteria have been developed by multivariate statistical techniques, they are usually refined with data from longitudinal studies and with methods of marshalling expert consensus, such as the “Delphi technique” or the “expert panel.” Kendall, R. E. Clinical Validity. Psychological Medicine, 19:45-55 (1989); L.C. Morey & J.K. Jones. Empirical Studies of the Construct Validity of Narcissistic Personality Disorders in E.F. Ronningstam (ed.) Disorders of Narcissism (1998); Jones, J. & Hunter, D. Consensus Methods for Medical and Health Services Research. BMJ, 311:376-380 (1995); Milholland, A.V. et. al. Medical Assessment by A Delphi Group Opinion Technique. NEJM, 288:1272-1275 (1973); Fink, A. et. al. Consensus Methods: Characteristics and Guidelines for Use. American J. Public Health, 74:979-983 (1984); Linstone, H.A. & Turoff, M. (eds.) The Delphi Method (1975); Fries, J.F. et al. Criteria for Rheumatic Disease. Arthritis & Rheumatism, 37:454-462 (1994); Altman, R.D. et al. An Approach to Developing Criteria for the Clinical Diagnosis & Classification of Osteoarthritis. The Journal of Rheumatology, 10(2);180-183 (1983). Typically, a committee of experts isolates a set of historical, physical and laboratory features of the potential syndrome as potential diagnostic criteria. Next, the “sensitivity” and “specificity” of these features is determined by the Delphi technique of “opinion sampling,” a process designed to use the consensus of experts in situations of uncertainty, by means of anonymity, feedback and iteration. Opinion sampling helps the committee clarify its thinking, usually resulting in a consensus about the potential classification criteria.

Then the committee conducts a prospective study, enrolling patients diagnosed with the target disorder and a control group with signs and symptoms that could be confused with those of the proposed disorder. The committee then proposes for further investigation a set of diagnostic variables with more variables than those considered by the experts through the Delphic method. This enlarged set of variables is used by these clinicians (blinded to the status of the case or control group) to diagnose the proposed target disorder. A variable is included in subsequent analyses if it discriminates those with target disorder from those in the comparison group. With this approach, the enlarged set of variables is progressively narrowed to that subset considered most effective in diagnosing the target disorder.

Finally, the experts would publish their findings. By publication, other specialists or experts may critique the methods, analyses and conclusions of the study in order for the conclusions, after iterative refinement, to achieve general acceptance in the scientific community.

VII. Differential Diagnosis Plus—“The Case Study”

An analysis of cases in which differential diagnosis is said to have proved general causation reveals that a “differential diagnosis” is not proof of general causation at all. What is, in fact, occurring is that the expert is using his or her examination of the patient/plaintiff as the basis for a “case study.” The case study is then being offered as proof of general causation—that is, proof of a new syndrome or disease. This new syndrome or disease is then being used in the process of differential diagnosis to establish specific causation as an element in the set of possible diseases or syndromes hypothesized to be responsible for the patient’s clinical presentation. Obviously, this process of proof is circular.

Illustrative are these two examples:

A. Jennings v Baxter Healthcare, 331 Or 285, 14 P3d 596 (2000)

At issue in Jennings was “general causation:” whether or not silicone in silicone breast implants (SBIs) either stimulated the immune system to cause neurological injury or was directly toxic to nerve tissue. Plaintiff’s forensic expert (“PFE”), a local neurologist, testified it was harmful through some unknown mechanism to nerve tissue, although he acknowledged that he was neither an immunologist nor a rheumatologist, those medical specialists on diseases mediated by the immune system.

Selection Bias—Non-Random Selection Process

PFE performed a basic clinical neurological examination on fifty women with SBIs, many of whom were referred to him by plaintiffs’ attorneys. (He acknowledged that it was a biased sample of people because 25% of them had been referred by plaintiffs’ attorneys. “It’s a very biased sample of people.”) Of the 75% not referred directly by attorneys to PFE, how many had been referred by plaintiffs’ attorneys to the immunologists or rheumatologists who had then referred them to PFE? PFE failed to reveal that information. This referral pattern is an example of what is known as “selection” bias, a methodological shortcoming that undermines the internal and external validity of the study.

Those patients sent to PFE were merely that partial subset of women with implants who had suspected neurological problems. Not sent to him were all those women who had implants with no neurological problems or those women with neurological problems not attributed to their SBIs. Did PFE identify the characteristics of the subset of women not referred to him? Nothing in the record indicates he did.

He saw five women in 1993 and 45 in 1994 for a total of 50 women. Of the 45 he saw in 1994, 43 he said had vestibular problems in that they relied on their vision to maintain their balance. Ninety five percent of some subset of these women (50, 45, ?) had tingling in their fingers and toes (or patchy sensory loss). (It was unclear whether these women had tingling in their fingers and toes and patchy sensory loss or whether the tingling in their fingers and toes was considered to be patchy sensory loss). Ninety five percent of some subset of these women (50, 47, 45, ?) had the combination of problems (95% of 43 = 40). The record left unclear what number in the total sample had what kind of problem.

	Exposed	Non-exposed	Totals
Cases	40	??	??
Non-Cases	10+ ??	10,000+??	10,010+??
Totals	??	??

Observer Bias–Data Mining

PFE apparently had formulated no test hypothesis about whether or not silicone caused a particular neurological syndrome before he began examining these women.

On examination, in a majority of these women, he allegedly found patchy sensory loss in their extremities (95% had tingling in their fingers; most were unaware that they had “lost sensation” [more accurately “had tingling”] in fingers and toes!).

PFE neglected to reveal the prevalence of tingling in the fingers or toes in women such as those referred to him with SBIs. Tingling in the fingers or toes is an extremely “non-specific” finding that could be accounted for by a host of causes unrelated to SBIs (for example, fluctuating estrogen levels).

These women also were reported to have had symptoms of “inner ear dysfunction.” More accurately, PFE said these women were relying on their vision to maintain balance.

He sent them to the vestibular lab at a local hospital for testing (presumably platform posturography testing). The results of these tests were not reported in the record. PFE then concluded that they had some kind of inner ear dysfunction. This investigative pattern is an example of what is called “data mining or data dredging—proceeding with a clinical investigation without an hypothesis beforehand about what might be causing their clinical profile and then finding some kind of problem and concluding from that that the patient’s problem was caused by the subject of the litigation—it is an invitation for developing false positive results.

On the basis of these examinations, he formed a post-examination hypothesis that silicone (through some unknown mechanism) caused an oddly focal (not systemic) neurological injury. He then purportedly tested this hypothesis by referring back to the results of his examination of these same 40-50 women. This process of confirming the hypothesis by examining the same patients whose examinations helped establish the hypothesis in the first instance is an example of the fallacy of “begging the question” and does not subject the hypothesis to a chance of being invalidated—the deck has been stacked, so to speak, in favor of the hypothesis. Oddly, the literature at that time about the purported effects of silicone did not include this unique combination of symptoms. See Submission of Rule 706 National Science Panel Report In Re Silicone Gel Breast Implant Products Liability Litigation (N.D. Ala 1998) (No CV 92-P-10000-5) Federal Judicial Center. www.fjc.gov/BREIMLIT/Science/report.htm p. 8.

Rates of Error

PFE then testified that the “rate of error” in using a clinical examination to establish specific causation was in the range of 5 to 7.5%, without identifying the gold standard upon which this statistic was established. Usually, when a diagnosis is based on clinical findings—in this case, symptoms–there is no true gold standard; the standard of reference is merely the degree of inter-observer agreement in the diagnosis; no data were provided about what the inter-observer agreement would be in a novel or unique case such as this one, where PFE was the only person to have identified the “syndrome’” he reported. Obviously, none conceivably could have been provided.

PFE did not provide the more pertinent rate of error—that rate of error in using a case study to establish general causation. Can it be said confidently that the rate of error is less than chance? The answer is, no; it cannot. The fact is, in the scientific community, case studies do not provide adequate evidence of general causative relationships. For references, see Masters, Case Studies as Proof of General Causation. Products Liability Newsletter, Volume XIV, Number 2, pps 2-10 (Summer 2005).

Investigator Bias

PFE represented that this pattern of symptoms in this non-randomized sample of women was distinguishable from the population at large. When asked why PFE thought this was so, he responded that he had examined 10,000 people over his career for neurological injury, and none had this combination of signs and symptoms. This answer is inadequate and non-responsive because he was not comparing the 45 women with SBIs to a cross section of the general population but only to those selected people sent to him as a neurologist who were suspected of having a neurological disorder (or perhaps just fibromyalgia syndrome). What would have been PFE’s response to the fact that of the vast number of women with SBIs examined by other forensic experts and non-forensic experts, only he found this combination of symptoms or signs and had related them to the SBIs? Given that the gold standard for a clinical diagnosis is inter-rater or observer agreement, would not the fact that PFE was alone in his findings have indicated that his diagnosis was aberrant—not in agreement with other experts and, therefore, unreliable.

Coincident Association Only

PFE said that there was a very strong “correlation” between silicone and the symptoms he found. But he did not perform any correlational statistical studies to substantiate this claim. He could not have since he was working with a non-randomized sample of patients.

Presumably, he established this “correlation” on the basis of the following thought process: PFE said that the combination of these two conditions (patchy sensory loss or tingling in the extremities and using vision to maintain balance) was extremely rare.

He said he ruled out alternate causes of which he could conceive that might produce these two symptoms. Silicone, he said, was the only common cause per history. Therefore, he concluded that they were due to SBIs.

This is a somewhat hollow conclusion about causation for at least two reasons. First, he did not establish a “correlation” as that term is used in science. In science, a correlation is defined as a numeric measure of the strength of linear relationship between two random variables. Dictionary of Statistics, pps. 35-39 (Penguin 2004).

Correlation =_{df as} a numeric measure of the strength of linear relationship between two random variables or, in general statistical usage, the departure of two variables from independence.

At most PFE “established” a coincident association or a non-statistical association—an association that very likely is due to chance alone. For sure, PFE could not establish that the coincidence was not due to chance alone given his fractured methodology. Even if he established a statistical correlation, that correlation does not logically imply causation. To state a case for causation, PFE would need to satisfy a fair number of Hill’s criteria. See K.J Rothman and S. Greenland, Modern Epidemiology, pps. 24-28 (2^nd edition 1998). PFE did not satisfy those criteria.

Second, given that PFE was not an immunologist or rheumatologist and unlikely to possess the skills and experience needed to rule out alternative causes. The logic of differential diagnosis—“proof by exclusion”–depends on the clinician being able to guarantee that he has not missed a possible diagnostic category. He cannot rule out what he has not included in the list of differentials. Two significant commonalities apart from SBIs, seemed to have escaped the investigator’s consideration: (1) all were women and (2) all were examined by him.

What could be more likely explanations of this coincident association? First of all, the probability of the patient having a neurological problem is high because the patient was referred to a neurologist. That is, the sample was infirm owing to selection bias. Second, the probability of a female patient having tingling in the fingers or toes is high because that symptom is a very prevalent among women in the general population and particularly in women with a diagnosis of fibromyalgia. The probability that the patient will have an inner ear problem (or more accurately indications of using vision to maintain balance) is also high because that is this particular neurologist’s expertise—testifying for otologists about inner ear problems. Investigators are notorious for finding what they want to find even if it is not really there. (Hence, the requirement for double blinding in the gold standard investigative study—the randomized, double blinded, placebo controlled study.) This fact raises the strong possibility of “measurement error” and “confirmatory or investigator bias.” Nothing appears to have been done to eliminate this kind of bias.

The findings in this group of women were not compared to appropriately selected control groups of women with and without SBIs. Use of controls is necessary to establish a “statistical association” between SBIs and the symptoms PFE identified as common to some of the women in his sample. And so, given the lack of control groups in PFE’s study, the study of these 40-50 women merely constituted a “case series.” This case series relied heavily on the subjective interpretation of PFE in that the series had not been published. Nor had the neurologist’s opinions been peer reviewed. This is a classic sign that the forensic expert does not want his conclusions or method reviewed by his peers for fear that they will find those conclusions or methods wanting.

All in all, and very ironically, PFE satisfied none of the criteria of scientific method as that phrase is generally understood in the scientific community—even though the Oregon Supreme Court said he did!

B. Clausen v M/V New Carissa, 339 F3d 1049 (2003)

At issue in The New Carissa was specific causation: the cause of death of millions of oysters. The New Carissa ran aground on the Oregon Coast, releasing oil into the oyster beds in the estuary. Within weeks, 3.-3.5 million oysters died. A sample of the tissue of some oysters revealed oil in their tissues.

Experts for plaintiff and for defendant established a list of six possible general causes of the oyster deaths: (1) Infectious disease; (2) freezing trauma; (3) acute toxic effects of non-oil containments; (4) acute toxic effects of oil; (5) low levels of salinity; (6) toxic effects of low-levels of oil. This was their list of differentials in the process of their differential diagnosis.

Both experts ruled out the first four potential general causes of death. They split on the last two general causes of death. An expert for the oyster growers, conducting his differential diagnosis, opined that the oysters died from general cause number 6, oil particulates in their gills that caused lesions in the gills leading to bacterial infection, ultimately causing death. The expert for the oyster growers ruled out general cause number 5 on the grounds that there was no strong odor or etching characteristic of mortality from anaerobic low salinity and that the oyster farm had been exposed to higher rainfall totals and hence lower salinity levels in the past without experiencing significant mortalities.

An expert for the ship owners, conducting his differential diagnosis, opined that the oysters were killed from general cause number 5, low salinity levels in the estuary caused by heavy rainfall leading to increased freshwater stream flow into the estuary. This expert ruled out general cause number 6 on the grounds that there were relatively low levels of oil to which the oysters were exposed, that there was no bioaccumulation of petroleum hydrocarbons in the tissues and that although support exists in the published literature for the theory of the expert for the oyster growers as to contact toxicity in animal systems generally, no or only sparse literature exists as to shellfish in particular. But this expert himself had written a paper which concluded that petroleum hydrocarbons and particularly the more toxic aromatics and heterocyclics accumulated by marine animals interact with cells and tissues to produce a variety of lesions.

Even so, the court ruled that the expert’s opinion was the result of a reliable scientific method—“differential diagnosis.” The court noted that the comprehensive list of hypotheses that might explain the symptoms or mortality must be capable of causing the symptoms or death. 339 F3d 1057-58, citing Hall v Baxter Healthcare Corp., 947 F Supp 1387, 1413 (DC Or 1996). That is, the general causation of the differentials must have been previously established by means other than through differential diagnosis.

This case illustrates the process of differential diagnosis in that the differentials had been established as general causes of mortality before the process of differential diagnosis began. It was not a case study.

VIII. Conclusion

Differential diagnosis is generally useful to establish specific causation only. When differential diagnosis is purported to establish general causation, it conceptually collapses into a case study. In the scientific community, a case study is not “generally accepted scientific evidence” sufficiently probative to establish general causation, particularly when it has not been published or subjected to peer review.

⁷ If the Oregon Supreme Court truly intends to adhere to the criteria of O’Key and Brown, it logically cannot hold that a case study (or differential diagnosis) is generally accepted scientific evidence (endorsed by the scientific community) of general causation. It cannot have its cake and eat it too.