Confirmation Theory[This is a graduate student paper from Dec., 1993. --mh]Confirmation Theory: A Metaphysical ApproachI. Problem The purpose of confirmation theory, ultimately, is to solve the problem ofinduction. This problem, or its solution, has two parts: first, to codifyinduction, that is, to state rules of inductive inference comparable to therules of deductive logic; second, to justify inductive inference, or explain whythis sort of reasoning is rational. The motivation of the first part of theproblem seems straightforward enough. The reason it constitutes a philosophical'problem' is the great difficulty that arises in carrying the project out (aswe shall see below). The source of the second part of the problem is not immediately obvious. Just as we do not waste our time attempting to justify deduction per se, it isnot at first clear why we should feel required to 'justify' induction. But therequirement derives from an argument of David Hume's that appears to show thatinductive conclusions are never justified. If there is such an argument (onehaving that appearance, that is), then in the light of it one would wonder howinductive knowledge is possible. But at the outset of our excursion into confirmation theory, I think itadvisable to set down the constraint that any theory of confirmation that asksto be taken seriously must allow the existence of inductive knowledge, whereknowledge entails logical justification. Though Hume is credited (or blamed)with the denial of the possibility of inductive knowledge, I do not think anyrational person could be content with such a position, and I can explain whyusing a principle Hume himself deployed to great effect in a different context.(1) It is a truism that a weaker evidence cannot defeat a stronger; that, in caseof any controversy, we should prefer the more plausible proposition to the oneless so; and that when any argument attempts to refute some proposition, weshould weigh the plausibility of its premises against that of the thing it seeksto refute, and accept only that which is more credible. And if someone presentsto me an epistemological theory which happens to have the consequence that Idon't know about anything but what I have directly experienced, and then asksme: Which would be more surprising, that this theory of confirmation is wrong,or that learning from experience is impossible?, I do not see how there couldbe any doubt how I should answer. Particularly in view of the course ofphilosophy through history, I do not see how it could be said of anyphilosophical argument that it had a greater credibility than every inductiveargument in existence. Consequently, our epistemology must accommodateinduction, not vice versa.A. Hume's argument Hume's 'refutation' of induction essentially goes as follows:(2)1. There are only three possible kinds of knowledge: (a) 'relations of ideas,'which are things that are true by definition, (b) direct observations, and (c)knowledge based on inductive reasoning, where an inductive inference is ageneralization from experience.2. Any generalization from experience presupposes 'the Uniformity Principle' --i.e., that the course of nature is uniform, or that the future will resemble thepast.3. So inductive knowledge can only be justified if this presupposition isjustified.4. The Uniformity Principle is not true by definition.5. Nor is its truth is directly perceived.6. And since all inductive inference presupposes the Uniformity Principle, anyinductive argument for it would be circular.7. So the Uniformity principle cannot be justified. (from 1,4,5,6)8. Hence, no inductive conclusion is justified. (from 3,7)B. Goodman's contribution Nelson Goodman controverts Hume's characterization of induction as simplygeneralizing from past experience in accord with the 'Uniformity Principle' withhis (Goodman's) "grue" example.(3) Consider the generalization, "Allemeralds aregrue," where "grue" means "observed before the year 2000 and green, or notobserved before 2000 and blue." Past observations of grue emeralds are nottaken to confirm that all emeralds are grue -- "grue" is not a projectiblepredicate. So this is an example of a case in which we do not expect that "thefuture will resemble the past." A natural reaction to the example is to suppose that only predicates thatcontain no references to particular times (or places or people, I suppose) canbe projectible. The easiest, plausible reply to such a proposal, though not theone Goodman chooses (he instead demurs that all hypotheses can be phrased so asto contain references to particular times), is to remove the time reference,defining "grue" simply as "observed and green, or unobserved and blue"; thepredicate is still unprojectible. The second easiest reply is to point outthere are many projectible hypotheses that refer to particular times andindividuals, such as "All Barry's classes this semester will run late." But I don't think either Goodman's correction of Hume or any attempts tomeet Goodman's counter-example with modifications to the Uniformity Principleimportantly alter the situation. "The Uniformity Principle" in Hume's skepticalargument just stands in for a principle of induction -- whatever principle wouldadequately describe induction -- and the argument goes through whatever you putin for the content of this principle (unless it's an analytic truth). It'spretty clear that a more detailed specification of the respects in which natureis uniform will still state an unobserved, matter-of-fact proposition which isunjustifiable through inductive means on pain of circularity.C. Generalization of the problem I think we can say still more on behalf of the problem of induction. Humedoes not need to assume that there is a 'principle of induction' in the senseof a single thing that is either (a) common to every induction or (b)presupposed in an induction in addition to the explicit premises. I do notthink that his 'Uniformity Principle' has these characteristics, and it is opento doubt whether any principle does. For an empiricist, the problem of induction in its most general form arisesmerely from the characteristic inductive inference has of going beyond the data. This characteristic is essential to induction; if an inference did not extendour knowledge beyond what is contained in the data (premises), then it would beclassified as deductive. Another way of saying this is that the intendedconclusion of an inductive inference is not the only one compatible with thedata. The significance of this feature of the inductive argument, why it isproblematic for empiricists, I think, is that it generates the possibility ofempirically equivalent theories competing with the desired conclusion. That is: let the inductive argument be "e; therefore, h." Because of the nature ofinduction, h is at least a de facto(4) theoretical claim. Alsobecause of thenature of induction, there are other ways e could be true besides by being aresult of h. Let h' be a hypothesis describing a sort of world where h is falseand e is true. (At its most trivial, h' can be just (e & ¬h).) Then theproponent of the inductive argument evidently prefers h over h'. Theempiricist/skeptic wants to know why. Why is h' a worse theory than h? Noticethat(1) the reason cannot, presumably, derive from analysis of concepts or deductivelogic, because h' is logically consistent; and(2) the reason cannot, so it seems, be based on observations, because both h andh' accommodate the observations equally well.(5) The problem is somewhat more forcefully stated if we imagine generalempirical equivalence between two theories. Two theories are 'generallyempirically equivalent' when for every observational statement, e, both theoriespredict e with equal strength. By an "observational statement" I mean one thatcould (in principle) be conclusively verified or refuted by direct observation. Now the problem arises for the empiricist philosopher: What (if any) rationalgrounds can there be for preferring one of a set of (consistent) empiricallyequivalent theories over others? There appears to be no empiricist answer,other than retreat into skepticism. Although there might be a non-empiricistanswer, it is not yet obvious what it would be. We can view Goodman's contribution most clearly in this light. "Allemeralds are green" and "All emeralds are grue" are empirically equivalent upto the year 2000. Goodman and Hume are therefore entitled to wonder whatgrounds we pre-2000 people can have for regarding one of the hypotheses assuperior to the other. Surely not an empirical reason? And from considerationof Goodman's example, the reader can no doubt see that empirically equivalenttheories can also be adduced in competition with the conclusion of any inductiveargument. Not so incidentally, this is how other sorts of skepticism usuallywork: the Cartesian skeptic engineers a hypothesis (such as the deceiving godor brain-in-a-vat story) to be empirically equivalent (predict the same sensoryexperiences) to the common sense view of the world, and then challenges us tocome up with a reason for preferring one hypothesis over the other. If theskeptic has done his job right, we are forced to admit the skeptical hypothesisto be conceptually possible and to accord with (more tendentiously: to beconfirmed by) all the evidence of the senses, and we consequently find itdifficult to adduce any reasons against it. The foregoing suggests a new formulation of the skeptical argument againstinduction, which is better than Hume's, and which we must undertake to refute:1. All epistemic justification is either a priori or empirical.2. For any given inductive argument there exist alternate, empiricallyequivalent conclusions that are compatible with the premises but incompatiblewith each other.3. So inductive knowledge can only be justified if we can have reason to preferone such hypothesis over the others. (from 2)4. We can have no empirical reason for preferring one of these hypotheses over others, because they are empirically equivalent.5. And we can have no a priori reason for such preference because(a) a priori reasons always issue in necessary truths, whereas(b) all the hypotheses in question will be logically and metaphysicallypossible.6. So, given some inductive premises, we can have no reason for preferring oneof the alternate possible conclusions over the others. (from 1,4,5)7. Therefore, inductive knowledge can never be justified. (from 3,6)II. Failed attempts at a solution There have been several, often ingenious attempts to resolve the problemof induction, but, apart from my own view, none has been successful. I will tryto survey these failed solutions below, though I am not sure that all the viewsI describe were intended as solutions to the problem of induction.A. Goodman's failure Nelson Goodman makes a perplexing attempt to reduce the second part of theproblem of induction to the first.(6) In order to justify induction, he sayswhatwe have to do is to formulate general rules and compare them with particularinferences. Individual inductions will then be justified by their conformitywith the rules, while at the same time, he says, the rules will be justified bytheir conformity with our practices, that is, with the inferences that weactually make. Now if Goodman's account is intended merely to describe a way of resolvingcertain difficult cases, then it need involve no circularity, and is, moreover,probably correct. That is, Goodman may only be saying: if we encounter aparticular inference whose validity is difficult to evaluate, we can appeal tosome rules that are known independently of that particular inference, and if weencounter a rule that is difficult to evaluate, we can appeal to particularinferences that are known independently to be valid. There would be no circularargument there. However, there would also be no refutation of skepticism, forthe skeptic doubts of induction in general, not just certain difficult cases;thus, he is not going to grant unproblematic knowledge of any rules or anyparticular inferences, however obvious they may seem to us. Because Goodman seems to think he is addressing and resolving thejustificatory problem of induction, it is more likely he meant his proposal asa general means of justifying induction, to be applied to justify every rule andevery particular inference. In this case, it is subject to the followingimmediately obvious objections:(1) Circularity. In general, if p is to justify q, then p must first be known. It is therefore impossible for p to justify q and q to at the same time justifyp, because in that case each would have to be known prior to the other. Thisis just what Goodman wants to claim, with respect to validity of individualinductive arguments and general rules of induction.(2) Gives a carte blanche on arbitrary practices. For instance, suppose thatwe had inductive practices corresponding to the opposite of every rule that wepresently accept and the opposite of every particular inference that wepresently accept.(7) We could still apply Goodman's procedure and windup'justifying' our practices since we would be able to bring our rules into accordwith our particular judgements. The only things that couldn't be justified bya Goodmanian procedure would be practices that didn't follow rules (N.B. it'snot clear why non-rule-bound practices should be unjustified) or inconsistentpractices. Surely this is the wrong result.(3) Goodman's suggestions, of course, would fail to impress the skeptic, for atthe first stage of the procedure, where Goodman proposes to 'justify' some rulethrough its conformity with accepted practices, the skeptic would say, "Wait aminute. I agree this rule describes the sort of inferences we actually make,but I don't think it's valid. Instead, I think the inferences we actually makeare all wrong." And in the second stage, where Goodman proposes to justify someparticular inference by appeal to general rules, the skeptic would object onceagain, "Wait a minute. All general rules of induction are wrong. Therefore,I cannot accept their use to justify particular inferences."(4) The procedure Goodman describes, even if it were a valid means ofjustification, would not help us any, since no one has been able to carry itout. That is, if this is the only way of justifying induction, then for all thetime that mankind has existed without having formulated the rules of induction,our inductive inferences have been irrational. Now someone might claim that themere existence of some rules that correspond to our inductive practices, evenif we don't know them, could render our practices justified. But it's not clearwhy this would be so, and in any case it hasn't been shown that some such rulesexist.Goodman's defense On the face of it, Goodman's proposal seems outrageous, but he does givean argument for why the procedure he describes would constitute a justificationof induction -- hence, an argument for thinking that what we accept is valid. It is a linguistic argument: The task of formulating rules that define the difference betweenvalid and invalid inductive inferences is much like the task ofdefining any term with an established usage. If we set out to definethe term "tree", we try to compose out of already understood wordsan expression that will apply to the familiar objects that standardusage calls trees, and that will not apply to objects that standardusage refuses to call trees. A proposal that plainly violates eithercondition is rejected; while a definition that meets these tests maybe adopted and used to decide cases that are not already settled byactual usage. Thus the interplay we observed between rules ofinduction and particular inductive inferences is simply an instanceof this characteristic dual adjustment between definition and usage,whereby the usage informs the definition, which in turn guides theusage. (pp. 68-9; emphasis added)Goodman's argument, in short, is that if we can formulate some rules thatcorrespond to the inductive inferences people call valid, then these rules willdefine the expression "valid inductive inference" because they state what wetake a valid inductive inference to be; and therefore, an appeal to these rulesto show that some particular inference is valid is legitimate. Again:The problem of induction is not a problem of demonstration but aproblem of defining the difference between valid and invalidpredictions. (p. 68; emphasis added)Against the principle of charity, I suggest that we take him at his word and,in particular, take his talk about definitions seriously. Attempting to defineknowledge into existence is a fairly typical linguistic philosophy ploy. Iflinguistic analysis had been around during Copernicus' time, I suppose somephilosopher might have argued: In order to determine whether the earth 'orbits' the sun, we mustfirst determine the meaning of the expression "orbits," and for thatwe must consider its ordinary usage. Now ordinary usage refers tothe sun as orbiting the earth. So Copernicus is simply committingan abuse of language in saying that the earth orbits the sun.I doubt Copernicus would have been impressed. The problem is that if ordinarypeople consistently call things that have the feature F, "X"'s, this mayindicate, not that "X" just means "a thing that has F", but simply that peoplebelieve things that have F to also have a second property, that of being X,which belief they could be mistaken in. Otherwise, every dispute over whetherA is B would become a linguistic dispute over the meanings of "A" and "B".B. Answer to Hempel Before he gives his account of confirmation,(8) Carl Hempelenunciates thesethree conditions on the confirmation relation that he thinks any theory ofconfirmation ought to satisfy:Entailment condition: If e entails h, then e confirms h.Consequence condition: If e confirms each of a set of sentences, K, then e confirms every logical consequence of K.Consistency condition: e is consistent with the class of all hypotheses thate confirms.Hempel's theory of confirmation, which is supposed to satisfy the aboveconditions, says that for any observational evidence, e, and any hypothesis, h,e confirms h if and only if e entails the 'development' of h with respect to theobjects mentioned in e. The 'development' of a hypothesis with respect to a setof objects is to be understood as what the hypothesis would assert if those werethe only objects in existence; or what the hypothesis does assert about thoseobjects. For instance, the development of "Everything is pink" with respect tothe Empire State Building and Mikhail Gorbachev would be, "The Empire StateBuilding and Mikhail Gorbachev are pink." The development of "There is aperfect being" with respect to Bob Smith would be "Bob Smith is a perfectbeing." To put it another way: the development of h with respect to O plus the proposition that nothing but O exists entails h; and h plus the assumption thatnothing but O exists entails the development of h with respect to O; and thedevelopment of h is an observational sentence. Hempel's theory suffers under the following objections:(1) Consider the observation that this pen (which I have before me) occupies(9)this region of space (a certain pen-shaped volume). This observation, underHempel's theory, must confirm both "Every region of space is occupied" and"Everything occupies this region of space." In symbolic logic, Ops confirms: (x)(∃y) Oyx, and (x) Oxs.(10)But the two hypotheses thus confirmed are contradictory. About the otherregions of space besides the one this pen occupies, one hypothesis says they areall filled, while the other says they are all empty. The first impact of this is that Hempel violates his own adequacyconditions, sc. the 'consistency condition.' Second, I think we should see itas a counter-example to the theory even independent of the consistency condition(which condition I think false): That is, it is implausible in any case thatthe observation of this pen in this region of space confirms either of the sillyhypotheses in question.(2) To speak more generally, Hempel's account is quite prodigal in doling outconfirmation. Let F(a) be an observation report saying anything about anobject, a (not necessarily an atomic sentence); and let Q be anypropositionwhatsoever. Then on Hempel's criterion, F(a) confirms (x)(¬F(x) → Q) (becauseF(a) entails (¬F(a) → Q)). This enables one, in short, to infer fromtheobservation of something having a certain property, anything we please about theobjects that don't have the property. One application of this Hempelianliberality is the 'Ravens paradox': we can confirm that all ravens are blackby the observation of, say, white shoes. Hempel is apparently willing to livewith this consequence, but I daresay most of us continue to find it counter-intuitive. A parallel result is that the observation of several green emeralds beforethe year 2000 would confirm that all emeralds are grue, which is counter-intuitive. Hempel considers thiscounter-example in his "Postscript" andappears to give in. He does not seem, though, to regard himself as abandoninghis theory of confirmation when he concedes that in actuality, contra theimplications of his initial account, observations of grue emeralds do notconfirm "all emeralds are grue". He thinks the solution is that unprojectiblepredicates like "grue" must be omitted from the language of science. But he hasfailed to see that the emeralds case is no different in principle from theravens case. Both paradoxes result from Hempel's allowing the observation ofan X to confirm anything you please about non-X's. He's willing to accept thisfor non-ravens; why is he unwilling to say the same thing about things observedbefore the year 2000? That is, since Hempel accepts that observations of non-ravens confirmanything and everything about ravens, why doesn't he accept thatobservations before the year 2000 confirm anything and everything about thingsafter the year 2000? Note also that Hempel would presumably be forced to deny that, once we'veconfirmed something, we are entitled to combine it with other things we know anddeduce consequences (which will also be confirmed); for otherwise, we couldconfirm "(x) if x is not a raven, then there is life on Mars" by observingravens and then combine it with the knowledge that there are non-ravens toconfirm there is life on Mars.(3) Normally, the observation of some type of object is taken to confirm thatthere are other objects similar to it. For example, the discovery of a blackhole would confirm that there are other black holes. But on Hempel's criterionthe observation of a black hole would disconfirm that there are other blackholes, because it confirms that this object (this black hole) is the only thingin existence: Observing any object, a, confirms (x)x=a because (presumably) oneobserves that a=a, which is the development of the hypothesis.(11) Similarly, we can see that Hempel's criterion makes the observation of anysort of thing disconfirm, and never confirm, the existence of anything even everso slightly different from it. For instance, the observation of a six foot tallman would disconfirm that there are any people as tall as 6'1" (because itentails the development of "every man is less than 6'1" tall" with respect tothe man observed), and the observation of a five-foot and a six-foot man woulddisconfirm that there is anybody between five and six feet tall; whereasintuitively, the evidence ought to confirm the hypotheses in these cases. Again, take the hypothesis that there are at least two black holes, whichshould be confirmed by the observation of one black hole, but would not be onHempel's theory. The development of "there are at least two black holes" withrespect to some class of objects, presumably, would be that that class ofobjects contains at least two black holes. So the observation of a single blackhole would never entail the development of the hypothesis that there are atleast two.(4) Hempel is again too restrictive in only considering generalizations onobservations. There is, on his view, apparently no way that any purelytheoretical claims (using non-observational predicates) could ever be confirmed. It is scarcely necessary to point out that scientific theory is filled withimperceptible entities with imperceptible properties, like protons, magneticfields, and potential energy.(5) Finally, Hempel makes no attempt to actually justify induction. There isno explanation of why it would be rational to believe, or to increase one'sdegree of belief in, things that have been 'confirmed' under his criterion ofconfirmation. Furthermore, in the absence of any such justification, it isdifficult to feel attached to his account as an accurate description of how wereason. Nor does Hempel have very much to say as to why we should accept hisaccount, which he describes as a 'definition' of confirmation, other than thatit satisfies his three conditions. It would be nice, for instance, if he wereto take some typical examples of inductive and scientific arguments and show ushow they conform to his model; but then, the examples we have considered abovesuggest this might be a difficult matter.C. Mill's methods Mill's methods of inductive inquiry, which appear in many logic texts today(having thus fared better than the rest of his logic), have more of the flavorof deductive reasoning than either of the preceding descriptions of induction. Because they are pretty sensible, it is possible someone might consider them toexplain the possibility of inductive knowledge. I intend to argue, not thatthey aren't good methods of inquiry, but that the identification of Mill'smethods does not resolve the problem of induction.Summary of the methods:1. Method of agreement: If E has one condition that invariably precedes it, then that condition is its cause.2. Method of difference: If E occurs in the presence of C but fails to occur when C is removed, all other conditions being held constant, then C is the cause of E.3. Method of residues: If E occurs in a given situation, and the portion of E due to some particular antecedent is known, then the remainder of E is theeffect of the remaining antecedents.4. Method of concomitant variations: If variation in C is accompanied byvariation in E, then C is the cause (or effect) of E.Shortcomings of the methods:(1) The methods are insufficiently general. They only serve to discover cause-effect relations amongobservable events. They would thus fail to explain, forinstance, inferences to the existence of theoretical entities, or to theexistence of physical objects on the basis of sensations. To show that I am notattacking a straw man by this objection, I quote Mill: "The four methods whichit has now been attempted to describe are the only possible modes ofexperimental inquiry..."(12) It's worth noting, apropos of the methods' inability to discover theexistence of physical objects, that Mill managed to convince himself thatphysical objects were mere 'permanent possibilities of sensation',(13) evidentlypreferring to demote their status rather than to admit the indictment of his owntheory of induction. It is pretty obvious, though, that that sort of thing isnot what the rest of us mean by "physical objects," and that Mill was beingirrational in rejecting the obvious in deference to the relatively tenuous (cf.my remarks in §I on the 'adequacy condition' on theories of confirmation).(2) The conditions of applicability of the methods are never literallysatisfied, at least vis-a-vis the methods of agreement and difference. It isnever the case that a set of phenomena have only one antecedent in common, andit is never the case that a single antecedent to an effect is removed withoutany change in any of the other antecedents. I don't doubt that in practice itis often possible to rule out a priori most of the irrelevant factors, so thatwe may say we have only one plausibly causally relevant common factor (for themethod of agreement), or only one plausibly relevant difference (for the methodof difference), but Mill fails to give any theoretical explanation of how we canknow which of the infinitely many conditions present in any given situationshould be ignored and which should be tested or controlled. Factoring intoMill's naive view in addition the possibility of unobservable conditions, itemerges as not only practically, but in principle impossible to follow thestrictures required by Mill's methods. Since the strict application of the Methods is impossible, perhaps Millwould say we should only approximately apply them, or perhaps they require somesupplement. But in the former case, it is not clear our conclusions would stillbe rationally justified; whereas in the latter, the nature and basis of therequired supplementary principles remains unidentified. One fall-back for empiricists is to claim that in a given inductiveinference, what sorts of factors are likely to be causally relevant is itselfpreviously determined through other inductions. For instance, when doing aphysics experiment, I don't bother to control for the day of the week becauseI have previous experience that the day of the week doesn't affect the outcomesof physics experiments. But note that this sort of reply would only work if weassumed -- contrary to fact -- that there are some basic inductions that we canstart from which do conform perfectly to the strictures of Mill's methods.(3) Neither Mill nor the logic books that paraphrase his methods say very muchin the way of their justification. Perhaps their logic is considered self-evident. Well, if we assume that agiven phenomenon under investigation musthave a cause, that the cause must be among a certain set of observable factors,and that it must be a single thing (and not a disjunctive or conjunctiveproperty), then I think the logic of the methods is evident; in fact, in thatcase, the cause could be deduced in accordance with the methods. But that isa lot to assume. If, then, we do not assume all of this but seek to establishit by means of the Methods, how can we explain the rationality of our mode ofinference? From the fact that we have always observed E to be preceded by C,plus even the fact that we have observed E not to occur in some otherwiseidentical circumstances when C was absent, does it follow that there is anecessary connection between C and E? I take it it does not deductively follow,since the combination of such observations as just described with a case of Cexisting unaccompanied by E in the future or in an unobserved part of the worldis consistent, and would falsify the 'necessary connection' thesis. So thereare ways the world might be such that the sort of observations premised inMill's methods (esp., for the use of the methods of agreement and difference)occur but there is no causation of the sort the methods would conclude; and thisbrings us back to our question of section I, of why we should reject the(empirically equivalent) hypothesis of that sort of world, in favor of that ofthe causally connected world. Mill himself is remarkably unsatisfying on this matter. He interpretscausation, essentially, as mere constant conjunction,(14) and he saysthat allinduction is ultimately founded upon the principle of the uniformity of nature.(15) I can see why he would say this. Given the uniformity of nature, we can inferfrom the fact that C has always been observed to be followed by E that C isalways followed by E and always will be; and if that is all there is tocausation, we can infer that C causes E. But when it comes to the justificationof the Uniformity Principle, Mill naively pronounces that it is itself justifiedby inductive argument, thus running afoul of Hume's problem (recall section IAabove).D. The Bayesian Approach According to the Bayesian theory of confirmation, the most impressivetheory so far, inductive reasoning is reasoning in accordance with theprobability calculus.(16) The mathematical theory of probabilitycontains fouraxioms:(1) P(a) ≥ 0, for any proposition, a.(2) P(t) = 1, if t is a tautology.(3) P(a v b) = P(a) + P(b), if a and b are mutually exclusive.(4) P(a & b) = P(a) × P(b|a)."P(ba)" is read "the probability of b, given a" and means the probability thatb would be true if a were true. From (4) follows Bayes' Theorem by three trivial steps: P(e & h) = P(h & e). P(e) × P(h|e) = P(h) × P(e|h). P(h|e) = P(h) × P(e|h) / P(e)Suppose that h is some hypothesis and e is some piece of evidence. TheBayesians interpret P(h|e) as the degree of belief you may assign to h upondiscovering that e is true, and they claim that inductive inference in generalis explained by the application of Bayes' Theorem. P(e) and P(h) are supposedto be the initial credence you give to e and h, respectively, before discoveringe. e confirms h on the Bayesian account just in case P(h|e) > P(h). From the Theorem it is evident that h will be best confirmed when P(e|h)is high and P(e) is low -- that is, when h strongly predicts something which isinitially improbable. In the best case, when h entails e, P(e|h) = 1. I will accept most of the Bayesians' assumptions, but will still have amplecriticisms to make. In particular, I accept that:(1) Beliefs come in degrees. It is a matter of introspection that one believessome things more strongly than others, and we describe the more stronglybelieved propositions as more 'probable'. It is a matter for stipulation thatdegrees of belief be measured from 0 to 1, with 0 being strongest disbelief and1 being the strongest possible belief.(2) We should try to conform our degrees of belief to the probability calculus. Although some object that this is psychologically implausible, the objection isno more damaging than the parallel objection against canons of deductive logicwould be. I would not accept this on 'Dutch Book' grounds, since "fair betting odds"is a highly dubious definition of degrees of belief -- surely degrees of beliefare not dependent on any so culturally specific institution as gambling. Instead, I accept the axioms of probability as self-evident.(3) "P(h|e)" and the rest of the terms appearing in Bayes' Theorem are correctlyinterpreted by the Bayesians, as referring to the rational degree of belief inh after e has been discovered, &c.However, the Bayesian theory suffers from a few unfortunate difficulties.Objections:(1) Unknown conditional probabilities: Bayesians implicitly assume that thereis some way of determining the quantities on the right hand-side of Bayes'Theorem independently of P(h|e). Unfortunately, this is rarely unproblematic. Let's start by considering P(e|h): Suppose I am drawing some marbles out of a large bag, and the first fiveI take out are black, and now I want to know what is the probability of the nextone also being black. This conforms to one typical form of induction. Bayesianconfirmation theory tells me that in the circumstance described I should assignas the probability of the sixth marble being black: the initial probability ofits being black (P(h)), times the probability of the first five marbles beingblack given that the sixth one is black (P(e|h)), divided by the initialprobability of the first five being black (P(e)). I think it's fair to say thatthis isn't terribly helpful. How in the world do I determine the probabilityof the first five marbles being black given that the sixth one is black? Not,presumably, by another application of Bayes' Theorem, which would just lead usin a circle. I could try changing the hypothesis so that it entails theevidence -- in which case I will know P(e|h) = 1 -- e.g., I could ask, what isthe probability that the first six marbles will have been black, given that thefirst five were (N.B. merely conjoining the old hypothesis with the evidence)? This is the only situation (i.e. entailment) in which P(e|h) is unproblematic,but this subterfuge only hands us over to the problem of(2) Unknown prior probabilities: For although the traditional arrangement ofBayes' Theorem, in which P(h|e) appears by itself on the left and everythingelse on the right, gives the subliminal impression that the quantities mentionedon the right hand side of the equation are previously given data by means ofwhich we can proceed to calculate P(h|e) -- and the term "prior probabilities"(for unconditional probabilities) strengthens this impression -- this is, as amatter of fact, a substantial assumption which calls out for justification. Formany important cases, including my "six black marbles" example, I contend thatone or both of the 'prior' probabilities that we are supposed to plug intoBayes' Theorem is epistemically posterior to the conditional probabilities. This is because the way people estimate (or calculate) the probabilities ofconjunctive facts or of sequences of events is by means of multiplying prior andconditional probabilities of individual facts or events. For instance: Suppose I want to calculate the probability of being dealt a royal flush. This is not something that is just immediately given for me. Rather, I willtake the probability of first being dealt a 10, jack, queen, king, or ace(=20/52); multiply it by the probability of next getting one of the remainingfour types of card, of the same suit, given the first card having been asdescribed (=4/51); then multiply that by the probability of receiving one of thethree remaining cards required, given the first two cards (=3/50); and so on.(17) Similarly, in my "black marbles" example, if I have some antecedent estimate ofthe probability of pulling out a black marble, how I assign the probability ofpulling five black marbles or of pulling six black marbles will depend on thisplus my estimates of how previous drawings of black marbles effect theprobabilities of future drawings of black marbles. If, for example, my beliefis that the drawings are probabilistically independent (like coin flips) thenI will just multiply individual probabilities to get probabilities of sequences. If I believe in induction, then I will give (uniform) sequences somewhat higherprobabilities. If I subscribe to the gambler's fallacy, I will give (uniform)sequences lower probabilities. The Bayesian seeks to reverse this process: he wants me to have the'prior' probability of drawing six black marbles initially given and use it todetermine the probability of the sixth marble being black given that the firstfive were. Notice how easy computing confirmation would be if I were not right aboutthis: if unconditional probabilities were generally independently known (priorto conditional probabilities), then we could forget about Bayes' Theorem andjust calculate P(h|e) directly from axiom 4 above: i.e., for any h and e, Icould just take my prior P(h & e) and divide by my prior P(e), and this givesme the desired conditional probability. Alas, this does not work since I haveno independent way of knowing P(h & e).(3) The return of grue: Like Hempel, the Bayesians are committed to saying thatobservations of green emeralds (before the year 2000) confirm "All emeralds aregrue." This is because, from Bayes' Theorem, P(h|e) > P(h) if and only ifP(e|h) > P(e), and the Bayesians interpret confirmation in terms of the relationP(h|e) > P(h). Take h to be "All emeralds are grue" and e to be "All emeralds observedbefore the year 2000 are green". Presumably, the initial probability of e isless than one. Since h entails e, P(e|h) = 1. Therefore, P(e|h) > P(e); so econfirms h. The problem is that although in this case e does raise the probability ofh, the discovery of e does not raise the probability of the excess content ofh above e, or of the remaining, unobserved instances of h. Intuitively, h makesa claim about emeralds observed before the year 2000; it also makes a claimabout emeralds only observed after 2000, and a claim about emeralds neverobserved. e confirms the first part of h, but it does not confirm (in fact,disconfirms) the remainder of h. For Bayesianism to solve the problem of induction, it would have to showthat for typical inductive arguments, the evidence confirms the excess contentof the hypothesis above the observations. This notion of "excess content" is worth looking into. Karl Popper andDavid Miller claim(18) that for any h and e, the excess content of habove e isequal to (h v ¬e), for reasons which are unnecessary to examine since they'rewrong. Intuitively, the excess content of (A & B) above A should be B, not ((A& B) v ¬B). My proposal is this:(a) If h can be written as a conjunction (e & x), where e and x are propositionsabout different things (separate and distinct classes), then the excess contentof h above e is x;(b) If e entails h, then the excess content of h above e is nothing (or atautology);(c) Otherwise, the excess content of h above e is h.Thus, for example, "All ravens are black" can be stated, "All observed ravensare black, and all unobserved ravens are black." Since observed ravens andunobserved ravens are disjoint classes, the excess content of "All ravens areblack" above "All observed ravens are black" is "All unobserved ravens areblack." Now I realize there may sometimes be some difficulty determining whatobjects a proposition is 'about'. Roughly, a proposition is about the smallestset of objects whose (non-tautological) properties and relations are part of itstruth-conditions. For instance, "Bill or Ted stole the lamppost" will count asbeing about Bill, Ted, and the lamppost. Attempting to precisify thedefinitions of "about" and "excess content" further so as to avoid all cleverlogical tricks that might come up would probably be a fruitless project. I think we have enough now to see the Bayesian's problem. Suppose that x(h) is the excess content of h above e, where e is some observational evidence. In that case, since x and e are about separate and distinct things, neitherentails the other, so we can't get an easy determination of either conditionalprobability that way. Even if we assume that we know P(x) and P(e), we won'tknow (or at least, Bayesian theory doesn't tell us) P(e|x), so we can'tcalculate P(x|e) (at least, not by means of Bayes' Theorem). So there is noBayesian principle for telling us whether e confirms the excess content of habove e.(4) The return of empirical equivalence: In section I, I claimed that theproblem of induction requires us to justify the preference of one theory overother, empirically equivalent theories. Suppose that h and h' are twoempirically equivalent theories and e is the evidence presented for one of them. Then all the Bayesians have to tell us is thatP(h|e) = [P(h) × P(e|h)] / P(e), andP(h'|e) = [P(h') × P(e|h')] / P(e)But since h and h' are empirically equivalent, P(e|h) = P(e|h'), and we can seethat the right hand sides of each of the above equations are identical exceptfor the prior probabilities of the hypotheses, P(h) and P(h'). In other words,on Bayesian principle, P(h|e) > P(h'|e) only if P(h) > P(h'), for h empiricallyequivalent to h'. So all now turns on what the Bayesian can tell us about these prior probabilities. Unfortunately, the so-called 'personalist Bayesians' maintain that thereare no rational constraints on the distribution of prior probabilities, otherthan the very weak constraints provided by the axioms of the probabilitycalculus mentioned above. I look upon this line as rather an abandonment of theproject than a solution to the problem of induction. If it were correct, askeptic could always avoid the conclusion of an inductive argument by assigninga low prior probability to the conjunction of the premise with the conclusionbut a high prior to the conjunction of the premise with the negation of theconclusion. Moreover, instead of implying that all different distributions ofpriors are equally rational, why wouldn't the contention that there are nofurther logical constraints on degrees of belief besides the austere probabilitycalculus rather imply that all different distributions of priors are equallyirrational? That is, if there is no logical principle determining what the apriori probability of something is, then instead of just making up a numberbetween 0 and 1 arbitrarily, hadn't I better suspend judgement on the matterentirely, refusing to assign any degree of probability? The Bayesians' oft-relied upon claims about theaccumulation of evidence tending to wash outdifferences in priors are irrelevant here; the problem is that I cannotrationally start with any prior distribution, so I can never get started on theprocess of confirmation and conditionalization. More objectively-minded Bayesians are apt to invoke such principles as thePrinciple of Indifference for the assignment of priors. This principle saysthat when we don't have any evidence favoring any of a set of alternatives overthe others, we should count each alternative equally likely. Now I don't wantto criticize this approach too much, since I think the Principle true and planto use it later on. In the event that we have no reason to prefer A over B norB over A, it seems reasonable that we would not expect A any more than B, norB any more than A. Expecting each of them equally is the only natural attitude;otherwise we should be called upon to explain our preference for one of them. (Note that the Principle of Indifference is thus seen to make sense only whenprobabilities are construed as degrees of expectation or belief.) However, theprinciple is subject to different, incompatible uses, as is well-known, some ofwhich have the effect of rendering induction impossible. At first glance, itseems the most natural way of applying the principle would be to say, "I shallbe indifferent with respect to all the different possible distributions ofproperties across space-time." But this sort of initial probabilitydistribution would preclude inductive learning. To explain what I mean by this, let's consider a simplistic example. Suppose that (out of idleness) I am planning on flipping a coin a hundred times,and I previously have no knowledge about how coin-flipping works except that italways results in either heads or tails. Then I know there are 2100 possibleoutcomes of my 'experiment'. To apply the principle of indifference, it seems,I would give each of these possibilities an equal chance of occurring. But ifI do this, I will be unable to partake of any inductive learning. For supposethat after 99 flips I have gotten 99 consecutive heads. My expectation of thenext coin being heads given this 'evidence' will equal my initial probabilityof getting 100 heads (=1/2100) divided by my initial probability of getting thefirst 99 heads (=1/299), which is ½, the same as it was before. Bayesians who favor induction would be likely to recommend a different wayof using the principle of indifference, for instance: there are 100 possibleproportions of heads that might result from my experiment (i.e, 1/100 of theflips are heads, or 2/100, or ... or 100/100). If I assign an equal probabilityto each of these possibilities, then I will be able to learn from experience inaccordance with Bayes' Theorem. Unfortunately, Bayesians have been unable toexplain the rationale for using the principle of indifference in this sort ofway (or some way that allows for inductive learning), as opposed to the formerdescribed application.Conclusion I think we can see that none of the above theories of inductive reasoningcomes close to addressing our problem. As a test to see whether any descriptionof induction is accurate, we can ask, "Does it imply that observations of greenemeralds before 2000 confirm that all emeralds are grue?" Each of the theoriesconsidered above, except for Goodman's, clearly implies that green emeralds doconfirm "all emeralds are grue": (a) because the development of "all emeraldsare grue" for emeralds before 2000 is that they are green; (b) by the method ofagreement, correlations between being an emerald and being grue confirm that theproperty of being an emerald causes the property grue; and (c) because theprobability of emeralds observed before 2000 being green given that all emeraldsare grue is greater than its probability otherwise. Only Goodman, because hemade up "grue", has avoided this consequence; but his theory, we found, givesno logical justification for induction. The only of the above theories that would plausibly justify induction isthe Bayesian theory, which can appeal to self-evident axioms of probability. Unfortunately, it is in need of considerable help for the problem of determining'prior' probabilities as well as some conditional probabilities. Nevertheless,it will be useful in justifying the following account of induction.III. Theory In general, inductive reasoning arises out of the fact that certainobservations and sets of observations that we make, as we feel, requireexplanation, and a hypothesis is justified on some evidence when it produces aplausible explanation for what would otherwise be some surprising observations. Before I explain and justify that contention, I cannot help making somepreliminary remarks about philosophical methodology, since it is primarilymisguided methodology and general epistemology that I think has made and willcontinue to make it difficult for people to accept the true account ofinduction. These remarks must be kept brief but will serve to explain someaspects of my approach.A. Remarks on philosophical methodRemark 1: No a priori commitment to empiricism. Many philosophers of science have taken it as axiomatic that all knowledgemust be based purely on experience. As explained in §I above, it is this ideathat creates a problem of induction. In fact, the arguments there elaborated(in IA and IC) are logically unexceptionable; if we accept the premises aboutthe possible nature of knowledge, then we are forced into skepticism. Itherefore have no intention of accepting the impossible mission of reconcilingempiricist scruples with the possibility of inductive knowledge. Nor do Iperceive the justification for the great credence that empiricism has receivedduring the last two centuries. Inasmuch as ethics, metaphysics, mathematics,theoretical science, all inductive conclusions, and even knowledge of theexternal world, are problems for empiricists to account for -- inasmuch, I say,as all of the interesting kinds of knowledge we find ourselves to have aredifficult or impossible to explain on strict empiricist assumptions -- hadn'twe better acknowledge that this theory (which cannot on principle appeal to anya priori justification and also lacks any empirical support) is wrong?Remark 2: Philosophy for people, not computers. I am not going to attempt to provide an algorithm for inductive reasoning,such that we could program a computer to evaluate our scientific theoriesaccording to it, and retire the scientists. I seriously doubt that that sortof thing can be done. Inductive inferences are made by conscious beings; thesebeings are capable of, and do, exercise a certain amount of judgement; sometimestheir judgements conflict, sometimes they are uncertain, and sometimes they arewrong. I will not regard it as a serious fault of my view of induction that Ifail to prevent these things from happening. I seek to describe inductiveinference as practiced by people, and to explain its justification; I do notseek to generate epistemic utopia. As a corollary of this, we should not insist that philosophical conceptspossess a degree of precision comparable to mathematical concepts, or thatphilosophical principles should be immune from misinterpretation. Vagueprinciples are just as capable of being true as precise ones.Remark 3: Against 'formal' criteria. Another thing I will not try to do is to give a purely syntactic or'formal' (whatever that means) criterion of confirmation. As desirable as itmight be (to mathematicians and computer programmers) to have such a thing, Iam afraid, alas, that the making of an inductive inference, unsurprisinglyenough, requires an actual understanding of its meaning. We ought not to refuseto recognize facts because they are inconvenient for certain of ourepistemological ambitions.Remark 4: On behalf of metaphysics. Against the strict verificationist criterion of meaning imposed bypositivists earlier this century to rule out 'metaphysics', I propose thecomparatively liberal criteria of meaning according to which any of thefollowing is sufficient for the concept of X to be meaningful:a) if we find it (by introspection) possible to think about X and believe thingsabout X;b) if we are able to classify some things as X and others as non-X;c) if the things we call X have something in common because of which we callthem X;d) if X's are different from non-X's. These criteria are the only apology I give for the metaphysics which is tofollow. I do not subscribe to the theory that philosophy must do with as fewdistinct or as few distinct non-empirical concepts as possible; instead, I thinkthe possibility of distinguishing correct and incorrect uses of a concept issufficient to establish a prima facie justification for its invocation wheneveruseful.Remark 5: A liberal helping of 'the light of nature'. I will have to make frequent appeal to self-evident facts. Although somewill want to criticize this sort of thing and demand 'proofs', I blame thenature of my subject matter. I cannot help the fact that philosophy is basedon intuition. Although the above terse remarks are unlikely to alter the opinions of anypositivists or empiricists, they will perhaps at least forestall unnecessary,predictable objections.B. Inference to the best explanation explained Inference to the best explanation involves these three elements: that theevidence premised is initially improbable (or unexpected), that the hypothesis(the conclusion of the argument) would explain it,(19) and that thehypothesis isthe 'best' of the potential explanations. When, and only when, these conditionsare satisfied, I say, there is a valid inductive argument from evidence tohypothesis. We have, then, three notions to clarify.1. Explanation There are at least three senses in which one thing can be said to implyanother, for instance, X implies Y: (1) meaning that Y is a precondition on X;(2) meaning that Y is a consequence deriving from X; or (3) meaning merely thatwhenever X is true Y is also true. As examples of these, the King of France isbald implies that he exists, in the sense that his existence is a preconditionon his baldness; the axioms of geometry imply the theorems, in the sense thatthe theorems follow from them, and the axioms explain why the theorems aretrue.(20) It is the second relation that we seek between theories andempiricalevidence, namely that of the observed facts being based upon the facts describedby the theory. Although I do not think it is possible to strictly define thisrelation, it is possible to give substantial necessary conditions on it, namely: h explains e only if1.h and e are true;2. h is 'prior' to e; and3. P(e|h) >> P(e|¬h).I won't claim these conditions are sufficient, since I am sure philosophers willpress counter-examples.(21) Fortunately, these conditions will proveenough tomake out the justification of an inference to the best explanation. The first condition requires no comment. The purpose of the secondcondition is to capture the asymmetry of the explanatory relation, somethingwhich is ignored by the standard deductive-nomological model of explanation. It is meant to invoke a metaphysical concept of priority, which is differentfrom (but encompassing) temporal priority. Metaphysical priority is therelation of one fact's being of a more basic, or more fundamental, level thananother. Although the explicit identification of this concept is likely to makeit a target for philosophical suspicion, it is commonly implicitly invoked inother contexts. Besides, as I think, in inductive reasoning, metaphysicalpriority is invoked in reductionist theses -- physicalism, for instance, claimsthat laws and facts of physics are the most basic, or metaphysically prior,facts -- and in any claim that one thing depends on another. There is, again,no definition of priority (other than what has just been said), but severalinstances of it can be named so as to give the reader the idea by letting himsee the similarity in these instances:1. Events earlier in time, of course, are prior to later events;2. Properties of and relations between parts (at a given time) are generallyprior to properties of a whole (at that time);3. Categorical properties are prior to dispositional, or causal, properties(again, at a single time);4. Necessities are prior to contingencies (another way to think of this is thatnecessities are considered as if they had the earliest time-index, because theyare 'eternal truths');5. Descriptive properties of things are prior to their value properties;6. Existence of substances or objects is prior to the existence (orinstantiation) of properties, relations, or events.The above six statements are not stipulations but synthetic judgements that Imake, based on the sense in each case that the thing which I call 'prior' cannotdepend on the 'posterior' thing, but the posterior thing might depend on theprior. The third condition on explanation is stated as it is because if h onlyslightly raised the probability of e, we would typically still not considerourselves to have an explanation. It's also there because, for pragmaticreasons, we only want to waste our time considering theories that are stronglyconfirmed, and not ones that are only slightly confirmed. So we want P(e|h) tobe much greater than P(e).2. Probabilities Let's assume in the present context that probabilities are rational degreesof belief. Contra Carnap, the assignment of prior probabilities must violateempiricist strictures against synthetic, a priori assumptions. For suppose weassign a low (but positive) prior to some proposition, x, as we will sometimeshave to if we can distinguish sufficiently many alternatives: then the denialof x has a high subjective probability, or, in other words, we believe itstrongly. But ¬x is synthetic, because by assumption its probability is lessthan one and every analytic truth's probability is one. The belief in ¬x isalso, so far, a priori, for to be based on experience is for a belief to receiveits high subjective probability by conditionalization on some observations;whereas we are assuming ¬x has a high absolutely prior probability. Let us alsosuppose that ¬x is true. Then, assuming that we are justified in our assignmentof prior probabilities, our belief that ¬x constitutes an item of synthetic, apriori knowledge. Carnap maintained that logical probability judgements are all analytic.(22) If he meant that the propositions to which we give these probabilities are allanalytic, then of course that's not true. But second, if he meant thatstatements of the form, "The probability of h is P" are always analytic, he wasalso wrong. In fact, such statements are always synthetic, since they say thatwe are entitled to repose a certain degree of belief in h. 'Analytic' truthsare supposed to be ones in which the concept (or the definition) of the subjectcontains the concept of the predicate, but that is not the case here, since onecould very well understand what the proposition that h was, without even havingthe concept of probability, let alone knowing that h had a probability of P. The first rule of the assignment of a priori probabilities is to respectyour intuitions. I do not know how to specify this 'rule' in any more detail,or even whether it can be further specified, but I will give some examples toillustrate my meaning: it is evident a priori that forces acting on bodiescause them to move. In contrast, it is initially implausible that a forceacting on a body causes a different body to change its color. It is initiallyplausible that conscious beings desire pleasure, but improbable that theydesire pain. It is initially improbable that an event can directly cause aspatially distant event; that motions and forces can cause states ofconsciousness; or that "grue" can cause anything. Contra Hume (et. al.), Ithink these examples and others show that we can and do have intuitions,completely a priori, about what sorts of things can cause what other sorts ofthings. The strengths of these intuitions are reflected in initialprobabilities. The reason it may be impossible to systematize these sorts ofjudgements is because they are not determined 'formally' but depend upon graspof the specific nature of the objects of thought. Knowing that forces probablycause motions, not changes of color, depends on understanding exactly what"force", "motion", and "color" mean. But since we do not have theoretical intuitions about everything, thesecond rule of the assignment of prior probabilities is to apply the principleof indifference to the possible states of affairs at the most basic(metaphysically prior) level of reality, when one has no reason for preferringone alternative over any other. For this purpose, alternatives should be asfinely individuated as possible (that is, as finely as you, the observer, candiscriminate). The purpose of this latter specification is to avoid the sortof inconsistency that could result from attempting to apply the principle ofindifference simultaneously to different partitions of the same space ofpossibilities. Third, in the absence of reasons for holding one thing to affect theprobability of another, different events or propositions are assumed to beprobabilistically independent. This principle has the same intuitive motivationas the principle of indifference: if we don't have any evidence, nor any apriori reasons either, linking A to B, then why would discovering A change ourdegree of expectation of B? If we change our degree of belief in B, we shallreasonably be called upon to explain why we did so. Fourth, negative propositions are generally more probable than positivepropositions (ceteris paribus). The idea behind this is that a proposition isconsidered 'positive' only because it singles out one specific alternative outof a wide, possibly indefinite range of possibilities. For instance, "The skyis blue" is a positive statement whereas "The sky is not blue" is negative justbecause "blue" is a narrower category (encompassing fewer possibilities) than"non-blue". "The sky is azure" is, similarly, more positive than "the sky isblue." That the presumption (i.e., the greater initial probability) is with thenegative claim then follows the principle of indifference. Finally, simple propositions are generally more probable than complex ones. The idea behind this is that in a complicated hypothesis, there is more to gowrong. A proposition gets to be 'complex' because it requires the existence ofmany entities, properties, or relations. Such propositions get lowprobabilities because their probabilities are products of the probabilities oftheir individual ontological commitments.3. The best explanation In general, there are two reasons why h could be a 'better explanation' ofe than h: first, because h is just more initially credible than h; second,because h predicts e more strongly than h does, thus being more of anexplanation. Accordingly, the best explanation of e is the one that has thehighest product of P(h) times P(e|h). Notice that from this and the preceding remarks about probabilities, we getan interpretation of Occam's Razor and the preference for simple theories: viz., the Razor enjoins us to pick the simplest and least positive theory we canthat would still constitute an explanation of the observed evidence. Introducing new complexities into a theory always lowers its initialprobability, so it can only be justified if it improves the explanation, i.e.,raises P(e|h), sufficiently.C. Problems of induction solved From the preceding analysis of inference to the best explanation, thereader can no doubt see that my justification for induction will be the appealto Bayes' Theorem. Since on my account there is an inference to the bestexplanation (from e to h) only when e is initially improbable but h renders emuch more probable, Bayes' Theorem directly implies that when there is aninference to the best explanation, the probability of h given e will be muchgreater than the initial likelihood of h. What I have to do now is explain whymy account escapes the difficulties I saddled the Bayesians with above insection IID, and how I answer the skeptical arguments rehearsed in section I.1. Answer to Hume's argument: This is simple. The premise that inductiveinference is based on 'the Uniformity Principle' is false. An inductiveinference follows the form of inference to the best explanation, and that is nota premise or a presupposition of the argument; it is just the form of theinference. Furthermore, the knowledge that that form of inference is valid isbased on the self-evident (but synthetic) principles of probability rehearsedabove.2. Answer to my argument about empirical equivalence (§IC): The premise thata priori reasons must issue in necessary truths is false; sometimes they onlyissue in probable truths. And when two empirically equivalent theoriespotentially explain a phenomenon, we may prefer one of them because of itshigher initial probability.3. Problem of unknown conditional probabilities removed: My extension of theprinciple of insufficient reason lets us assume probabilistic independence whenconditional probabilities (or probabilities of conjunctive facts) are otherwiseunknown. Note that this does not, however, destroy the possibility ofinduction. The invocation of the different 'levels' of reality saves us fromthat. Probabilistic independence will exist only on the most fundamental levelof things; but indifference among combinations of facts on this level will forceus not to be indifferent about combinations of facts on more derivative levels. And the intuitive idea here is that the perceptual observations we make are the'superficial' level, whereas scientific (and other) theories attempt to get atmore basic facts. Because the theories will imply certain things aboutobservations, our probabilities for them will (partially) determine ourprobabilities for observations. For instance, suppose that most possiblefundamental theories imply the uniformity of nature in some observable respector other (I don't know whether this is true or not); then as we start with equalprobabilities for the theories, or for different possible combinations oftheoretical-level facts, we have to give a specially high probability to uniformseries of observations, so we have to make observations of a certain characterincrease the probability of further observations being similar.4. Problem of unknown prior probabilities removed (sort of): Well,at least onmy theory prior probabilities aren't completely subjective. Although it isdifficult to figure out what they are, we won't be forced to conclude inductionis impossible (as on some interpretations of the principle of indifference), andwe don't have the Bayesians' problem determining probabilities for conjunctivefacts or series of events, as just discussed. Further specification of the means of deciding initial probabilities(especially vis-a-vis my first 'rule' of using our intuitions) is a topic forfuture research. But even if it proves difficult or impossible to be morespecific here, this would certainly not indicate the account thus far given isnot true. Rather, it would only indicate to me that as a matter of fact, therearen't any precise rules for determining initial probabilities. If that happensto be the way things are, then it will, of course, be fruitless to demandphilosophers provide such a set of rules.5. Grue hypothesis not confirmed: There is no suggestion that "All emeralds aregrue" would, on my theory, be the best explanation -- or any explanation at all-- of emeralds observed before the year 2000 being green. "All emeralds aregrue," just by itself, cannot in fact explain emeralds before 2000 being green,because it is not metaphysically prior to the latter. If anything, thehypothesis in this case is posterior to the evidence, on the basis of eithertemporal or part/whole relations (the color of the pre-2000 emeralds is a partof the hypothesis, and it is temporally prior to the remaining instances of thehypothesis). But on this showing, no universal proposition can ever explain itsinstances, because the instances are always prior to the generalization, being,as I claim, parts of it; and this is contrary to a very common conception ofexplanation. I do think this result is correct though. You can't explain factsby just repeating them, or repeating them with a bunch of other facts added on. You have to cite something different from the explanandum. In cases where auniversal proposition appears explanatory, and in which it does get confirmedby its instances, I think the real, suppressed explanans is the existence ofsome causal connections between properties. For instance, when we think that"all metals expand when heated" both explains and is confirmed by the expansionof various particular metals, what is really explaining and getting confirmedis that the property of being a metal plus heat causes expansion. The fact thatall metals expand when heated is just a deduction from this. "All emeralds are grue" cannot get this kind of confirmation either, unlesswe consider it antecedently plausible that there could be some sort of causalrelation between being an emerald and grue. That there would be a causalconnection between emeraldness and green (perhaps mediated by some third factor)is much more plausible, so it would be a better explainer.6. Empirically equivalent theories not equally confirmed: If e confirms h andh is empirically equivalent to h, there is no suggestion on my theory that hhas to be equally confirmed, or even confirmed at all; for if h explains e, itdoesn't follow that h also explains e, even though h might predict e. To takean obvious sort of example, suppose that atomic theory explains and predictsseveral observations of ours. Now suppose some positivist-inspired individualproposes a theory that is just the conjunction of all the observationalconsequences of atomic theory, but with the denial that there really are anyatoms superadded. This new, instrumentalist theory will be empiricallyequivalent to the atomic theory that the rest of us all believe. It would not,however, even if it were true, explain any observations, since it is notmetaphysically prior to the observations (for the latter are a part of thehypothesis). Incidentally, I do not really mean to deny that we can confirm theconjunction of all the observational consequences of atomic theory in a sense. I have been focusing on direct confirmation in the preceding discussion ofinference to the best explanation. But any consequence of a directly confirmedtheory can be said to be indirectly confirmed. So we can confirm theobservational consequences of atomic theory indirectly, by first getting aninference to the best explanation directly confirming the atomic theory itself. Hopefully, the ambiguity between confirmation and direct confirmation will causeno serious confusion. Let's also consider two other problems of confirmation theory to see howthey are resolved:7. The ravens paradox: This paradox is generated from Nicod's criterion plus(in Hempel's terminology) the equivalence condition. Nicod's criterion ofconfirmation says observation of an A that is B confirms that all A's are B, andthe observation of a non-A is irrelevant to (neither confirms nor disconfirms)"All A's are B." The equivalence condition says that evidence that confirms ahypothesis confirms anything logically equivalent to the hypothesis. Theresulting paradox is that from Nicod's criterion, the observation of a non-blacknon-raven confirms "All non-black things are non-ravens;" so by the equivalencecondition it also confirms "All ravens are black;" but also by Nicod's criterionthe observation of a non-raven should be irrelevant to "All ravens are black." Hempel's answer to the paradox, as noted above, was to accept that observationsof non-ravens confirm hypotheses about ravens. On my account it is not at all clear that observation of an A that is Bwould generally confirm that all A's are B. It could only do so via someplausible explanatory hypothesis, such as that A's cause B's. But it seems veryunlikely that non-blackness causes non-ravenhood. That it is the first part of Nicod's criterion -- the part about an A thatis B confirming all A's are B's -- that should be rejected is also supported bythe consideration of other examples, such as the grue case.8. The everything-confirms-everything problem: This paradox derives from twoplausible-sounding conditions on confirmation: (a) prediction condition: if weverify a prediction of a hypothesis, then we have confirmed the hypothesis; (b)consequence condition: if we confirm a hypothesis, then we thereby confirm anyconsequence of the hypothesis. These conditions imply that anything confirmsanything, for suppose A and B are any two propositions. A confirms (A & B) bythe prediction condition; and thereby it confirms B by the consequencecondition, since B is a consequence of (A & B). But in my view, (A & B) does not explain A, once again because it lacksmetaphysical priority; and therefore A will not confirm (A & B) (at least notdirectly). I intend for my theory to satisfy the consequence condition but notthe prediction condition: once you have confirmed a hypothesis by inference tothe best explanation, you are licensed in deducing consequences from it, whichare indirectly confirmed. Thus (A & B) might, in some cases, get indirectconfirmation from A, provided it was a consequence of some explanatoryhypothesis; but it would not get direct confirmation, and usually would not getany confirmation.D. Examples supporting the theory The best way of seeing whether a theory of confirmation -- or, indeed, anysort of philosophical theory -- is correct is to consider typical examples. Every theory of confirmation that I know of but one has clear counter-examplesagainst it, some of which we have already described. But I believe that mytheory will suffer from no counter-examples. Every inference that intuitivelywe would take to be a valid induction can be seen as an inference to the bestexplanation, and every inference to the best explanation will intuitively seemto be a valid induction.1. Induction by simple enumeration: Consider the type of inference instancedwhen we observe a large number of ravens that are all black, and conclude thatall ravens are black. Here our implicit hypothesis is a vague causalhypothesis: either being a raven causes one to be black, or blackness causesravenhood, or a third factor causes both ravenhood and blackness. In this case,we'd probably go with the last hypothesis, not the first because ravens comeinto the world already black, and not the second for the same reason plus thatthere are black non-ravens. But constant conjunctions can be explained by anyof these sorts of hypotheses. The evidence in this case, that all observed ravens are black, isimprobable, at least given the denial of the hypothesis, since (by my third ruleof probability assignments) the default assumption would be that the probabilityof a series of ravens being black is the probability of a single black ravenraised to the power of the number of ravens observed -- a small number if theseries is long. The hypothesis is metaphysically prior to the evidence in virtue of myfourth criterion of priority (see §III.B.1), viz. that necessities are prior tocontingencies, because causal relations between properties or event-types (N.B.not particulars) are necessary. The probability of all observed ravens beingblack given the hypothesis is one. And the initial probability of thehypothesis should be fairly good since it's intuitively plausible that a commonfactor (which we now know must be in raven genes) both produces ravens anddetermines them to be black. Not meaning to take Goodman's line on this, I suspect that 'projectible'predicates generally correspond to ones that are recognized by normal cognizersbecause we only choose to name a property if it seems intuitively credible thatit is the sort of thing that can partake of causal relations. For this reason,gruesome problems do not arise in a practical context, because people'sintuitions tend to agree.2. Mill's methods in general: What I called "induction by simple enumeration"corresponds to the method of agreement, but the rest of Mill's methods areequally easily explicable. The hypothesis that C is a necessary cause of Ewould explain why when we remove C, E does not occur, and the evidence here isimprobable (at least given the denial of the hypothesis) on the basis of apriori principles that every event (or at least most events) have causes andthat like causes tend to have like effects. Similarly, the hypothesis that Ccauses E would explain a correlation between variations in C and variations inE. Finally, the method of residues isn't really a form of induction at all. It just expresses an intuitive principle stating that composition of causesgenerally compounds effects. Application of the method of residues is anattempted deduction, or a deduction relative to this implicit assumption.3. Theoretical entities: When we explain macroscopic properties of substances,such as solidity or liquidity, in terms of forces acting between atoms, or againexplain these in terms of the composition of the atoms (e.g., electronstructure), the theoretical entities invoked count as potential explainers invirtue of part-whole priority. Sometimes theories invoke historical priority,as in sociobiological explanations of properties of organisms in terms of theirevolution. An example of the priority of existential claims would be thepositing of fields to explain forces between objects; here the existence of afield (as some kind of entity) is prior to the (causal) relations betweenobjects by my sixth criterion of priority. Finally, the priority of categoricalproperties over causal properties is invoked in such examples as the positingof wave properties of light to account for interference patterns produced byinteracting light rays -- or, of course, any postulation of properties of anobject to explain its behavior. In each of these cases, the existence of the explanations is intuitivelytaken to confirm the hypotheses that explain; and in all of these cases, we cansee that my view upholds the validity of such an inference.4. Existential inferences: Recall that I harried Hempel with the fact that hecouldn't explain the inference from the observation of any object (such as ablack hole) to the existence of similar objects (objection 3, §IIB). Can Iexplain this sort of inference? Well, this is a difficult one, but I think thebasis for the inference is essentially this: that if there were only one blackhole in the universe, it is very unlikely that we would have seen it, on thebasis that we have not searched most of the universe, and we would give thesingle black hole an equal probability of being located anywhere. Thus thehypothesis that there are many black holes greatly increases the priorprobability of our observing one. A second basis for this sort of inference is that the observation of sometype of object demonstrates the physical possibility of such objects, which wemight have doubted before; and further, it suggests the existence of mechanismsthat produce such objects. For instance, observation of a unicorn woulddefinitely confirm that there are more unicorns because it would suggest a wholespecies, as a 'unicorn-producing mechanism' (similar to the means whereby otherorganisms are known to be produced). The observation of a black hole (so faras one can observe it) would suggest the existence of a mechanism whereby suchan object can be produced, and that would increase the likelihood that the samemechanism produced other black holes.5. Other minds confirmed: I can't take time to go into the problem of otherminds in much detail, but we can see how the discovery of the existence andcontents of other people's consciousness is confirmed by an inference to thebest explanation for their behavior. The intrinsic, categorical properties ofpeople are metaphysically prior to their dispositional properties and hence totheir behavior. We may assume that the hypothesis of people having certainmental states (desire for ice cream, fear of heights, &c.) predicts that theywill behave in certain ways that we can observe each other to do. And theseforms of behavior are initially improbable because they are very complex andorderly; the chances against a random assemblage of physical parts withoutconsciousness turning out to be able to play chess, for instance, are astronomical. Thus mental statesexplain behavior, and are thereby confirmed.6. The second law of thermodynamics: The standard explanation of the entropylaw illustrates my conception of probability: Imagine there's a box filled withsome gas, and the temperature is higher on the left side than on the right. The material is free to flow around the box, from one side to another. In thekinetic theory of heat, the molecules on the left are moving faster, on average,than the ones on the right. Periodically, due to random motion, a molecule willpass over the imaginary line down the middle of the box. If it is passing fromthe left to the right then the chances are that it will be a fast molecule(since most of the molecules on the left are fast). Similarly, mostly slowmolecules will cross from the right side. This will remain true until thetemperatures on both sides are equalized. This example supports my theory on two accounts. First, in respect of thecorrect method of determining prior probabilities: the physicists are sayingin essence, (allowing for a higher average temperature on one side of the box),assign a uniform probability distribution over trajectories of molecules on eachside of the box, assuming trajectories of distinct molecules to beprobabilistically independent; and from that generate the probable macroscopicstate of the gas at a later time. Hence, the principle of indifference isapplied to the fundamental state of the gas, which is, the properties of theparts at an earlier time. To find the probability of diffusion occurring we donot, for example, apply the principle of indifference with respect to possiblemacroscopic states, evidently because we implicitly recognize the microstatesto be more basic. Second, this explanation of diffusion is taken as a confirmation of thekinetic theory of heat, viz. that substances are composed of molecules andtemperature is (roughly) a measure of their motion. I don't know what theprobability of diffusion occurring given the denial of the kinetic theory ofheat would be, but I suppose it's moderately likely. This is a pretty clearcase of an inference to the best explanation, where the explanatory hypothesisstates more fundamental facts that greatly increase the probability of theexplanandum. This concludes our consideration of examples. Let us turn at last toIV. Summary & conclusion I think I have just given the rough solution to the problem of induction. My reasons for thinking my description of induction correct are, first, that itseems intuitively plausible to me; second, that it does validate induction,whereas it appears otherwise very difficult to explain why induction isjustified; and third, that consideration of several typical examples ofinductive inferences reveals them to accord with my account, whereas it revealsclear counter-examples to every other view of induction I know of. Certain sorts of reader may be inclined to attack the account on thegrounds, first, of its unabashed metaphysics, especially in the appeals tometaphysical priority and necessary connections between properties (causation);second, relatedly, of the explicit a priorism invoked in the assignment of'prior probabilities'; and third, of the lack of precision and detail in my'rules' for determining confirmation and my explanations of concepts. I am wellaware of these objections, but I do not regard them as serious. The first twowere discussed briefly above (§IIIA), and they strike me as mere prejudices. The third objection, while quite true, and perhaps the most important one todiscuss, does not strike me as weighing against the truth of my account, butonly as perhaps recommending to me further study and elucidation, wherepossible. And as I have suggested previously, though it is a matter beyond thescope of this paper to discuss in any detail, in the all too likely event thatit is not in fact possible to specify the principles of induction withprecision, or to analyze certain of the concepts it must make implicit use ofinto anything simpler, the demand that we do so is a permanent roadblock tophilosophical knowledge, and will forever be used to discredit true theories andsupport false ones. The only way I know of to satisfy the demand for preciseanalysis is to say something false, to which I think the history of confirmationtheory, as of most of philosophy, bears adequate witness. At no point in the elaboration of my account have I been driven to sayanything that could be described as counter-intuitive, and this should be notedwell, for it is not an easy state of affairs to achieve. Positivists typicallywind up denying we can confirm the existence of any theoretical entities; Hempeltells us that observations of anything not a raven automatically confirmanything about ravens; Mill and other empiricists compose physical objects of'sense data'; Bayesians must say green emeralds confirm that all emeralds aregrue; standard accounts of explanation allow later events to explain earlierones; and so on. I do not know what these philosophers think justifies theirtheories. For my part, I do not know how a philosopher can hope to do betterthan to accord with all of our intuitions.Notes1. In "Of Miracles", An Enquiry Concerning HumanUnderstanding, §X.2. Enquiry Concerning Human Understanding, §IV.3. "The New Riddle of Induction," Fact, Fiction, and Forecast(Cambridge, Mass.: HarvardUniversity Press, 1955), chapter III.4. By "de facto theoretical" I mean not actually observationallyconfirmed, thoughperhaps observable in principle. Otherwise, the inductive argument would besuperfluous.5. At least. If h' = (e & ¬h) then h' accommodates the data perfectly.6. "The New Riddle of Induction," op. cit.7. Exactly how we interpret "opposite" here, whether as contrary orcontradictory,or whatever, is immaterial for the purposes of the illustration.8. In "Studies in the Logic of Confirmation," Mind, vol. 54, 1945,pp. 1-26 and 97-121.9. For an object to 'occupy' a region of space, the object must fill up the spacewithout extending outside it.10. The universal quantifier in the second hypothesis ranges over regions ofspace. To let all quantifiers range over both physical objects and regions of space, we couldsay, "(x) if x is a region of space, then (∃y) O(y,x)." The argument will be unchanged.11. In a footnote Hempel says, without explanation, that his account ofconfirmationapplies to a language in which "the use of ... the identity sign is not permitted," butI am disinclined to allow him off the hook of this counter-example by any so ad hoc orarbitrary reason. We might try rephrasing the hypothesis that there are other blackholes as, "there are black holes that don't have this spatio-temporal position."12. A System of Logic, Book III, chapter VIII, section 7.13. An Examination of Sir William Hamilton's Philosophy,chapter XI.14. A System of Logic, Book III, chapter V, §2.15. Book III, chapter III, §1.16. For introduction to Bayesian confirmation theory, see Howson andUrbach,Scientific Reasoning: The Bayesian Approach (La Salle, Ill.: Open Court, 1989) and JohnEarman, Bayes or Bust? (Cambridge, Mass.: MIT Press, 1992).17. I get 1 in 649,740 on this basis. The reader can check my calculations.18. "A Proof of the Impossibility of Inductive Probability", Nature302 (1983),pp. 687-688.19. "Would explain" because to say that the hypothesis does explain theevidencewould already be to imply its truth.20. The fact that different axiomatizations of geometry are possible thatwould yieldthe same set of logical implications, understanding implication in the third sense, doesnot alter the point. If one chooses less evident mathematical facts to derive the morebasic principles, then one will just be giving an inferior axiomatization.21. The main sort would be the case where A causes both B and C, and B isprior toC. In this case observation of B would probabilify C, but B wouldn't be the explanationof C; A would. I wouldn't know how to modify my conditions to rule out this sort ofcounter-example.22. See An Introduction to the Philosophy of Science. |
|