3. Generation and Pursuit-Worthiness in Medical Diagnosis
A typical diagnostic process begins when a patient arrives at a hospital or clinic and reports certain symptoms or ailments. Insofar as the situation allows it, the physician will start by interviewing the patient and performing a physical examination to gather information about the patient’s state, how long they have experienced the symptoms and their broader medical history. Based on these, the physician tries to generate one or more possible explanations for the salient aspects of the case. For example, if a patient has uncontrol­lable hypertension (high blood pressure), the physician may conjecture that the patient has renal artery stenosis (narrowing of kidney arteries), since this would explain the signs.
Our use of the term ‘generation’ here should be understood in a broad sense. In most cases, medical diagnosis does not involve formulating completely novel hypotheses. Rather, it will primarily be a case of recalling already known conditions and realizing that they could potentially account for the salient signs and symptoms.5 However, this is not a sharp distinction. When facing atypical or complex cases, physicians may have to combine their knowledge of possible diseases in novel ways to explain the condition of that specific patient.
While physicians will often be able to think of a large number of theoretically possible diagnoses, it is neither practically possible nor advisable to consider every single one. Physicians need to pick out a limited number of hypotheses to focus on. The set of diagnostic hypotheses actively considered at a given time is called thedifferential diagnosis .6 There are good reasons why physicians need to limit themselves to a relatively narrow differential diagnosis. First, limitations of working memory preclude working on too many hypotheses at once (Sox, Higgins and Owens 2013, 9). Second, actively pursuing too many hypotheses can lead to potentially harmful over-testing (Richardson et al 1999, 1214-15). Third, in emergency situations there is no time to test every conceivable hypothesis. With a patient’s health or life on the line, we need to be able to effectively, rapidly and efficiently determine the likeliest cause of their ailments. This requires wisely selecting a limited range of hypotheses to focus on.
These arguments are often applied to the choice of a differential diagnosis, but similar points apply already at the generativestage. Just as it is inadvisable to select too broad a differential diagnosis, physicians cannot—and should not—try to generate a list of every single possible explanation before selecting a differential diagnosis. As argued above, generating hypotheses and selecting them for pursuit are subject to the same normative considerations. Just as physicians need to make good choices about which hypotheses to include in their differential diagnoses and which of these to prioritize for testing, they must choose how to generate possible diagnoses, as well as when to stop .
On the grounds of what kinds of considerations, then, should these decisions be made? The most popular approach to the problem of choosing whether to test a given hypothesis in the medical literature is the so-called threshold approach (Pauker and Kassirer 1980; Djulbegovic et al 2015). This approach is based on decision-theoretic models which compare, e.g., a choice between: (i) applying treatment on the assumption that the hypothesis H is true; (ii) applying a test for H , and then only apply treatment if the test is positive; (iii) stop working on H , i.e. neither test nor treat. Given quantitative estimates of (a) the reliability of the test, (b) the likelihood of the salient consequences of treating and testing and (c) the utility of these consequences, one can derive thresholds for how probable the hypothesis needs to be in order for it to be most rational to test, treat or abandon the hypothesis.
Threshold models highlight a number of factors that should be weighed against each other in clinical decision-making, including: How reliable are the available tests? How safe/harmful are the tests? How dangerous would the disease be, if missed? How effective is the available treatment? How safe/harmful is the treatment in itself? Briefly put, on this approach, physicians have to consider whether their confidence inH is high enough for the potential benefits of treating the disease (if H is true) to outweigh the potential harms of treating or testing unnecessarily (if H is false).
While these factors are indeed important, we want to highlight a further type of consideration, which can be called strategic considerations , that go beyond the direct consequences of tests and treatments for the health of the patient. As Peirce (1938-1952, §7.220) points out, the pursuit-worthiness of a hypothesis also depends on what we might learn from pursuing the hypothesis even if it turns out to be false. Testing a hypothesis can have important downstream effects for later stages of inquiry, in addition to merely confirming or disconfirming the tested hypothesis.7 For instance, an imaging study which fails to detect renal artery stenosis may also show that the adjacent adrenal gland is enlarged, thus instead suggesting pheochromocytoma (a tumor of the adrenal gland) as the cause of hypertension. At other times, it can be worth trying to rule out a potential diagnosis simply to make the diagnostic space more manageable, i.e. to pre-emptively prune back possibilities that might otherwise become relevant later on. If testing can be done reliably and without risk of harm, it can be worth trying to rule out even fairly unlikely hypotheses early on. Examples of this could include serologic testing for Lyme disease, fat aspiration for amyloidosis and ferritin levels for myocardial iron.
Strategic considerations involve reasoning about how pursuing a specific hypothesis can influence later stages of inquiry, including future generation of hypotheses. It is this dynamic and intertwining relationship between hypothesis generation and selection for pursuit which threshold models, in their current form, fail to capture. Before making this argument, however, we want to provide a concrete illustration of our framework, by way of analyzing a detailed clinical case.