Framing human inference by coherence based probability logic ⋆ Niki Pfeifer and Gernot D. Kleiter Department of Psychology, University of Salzburg, Austria Abstract We take coherence based probability logic as the basic reference theory to model human deductive reasoning. The conditional and probabilistic argument forms are explored. We give a brief overview of recent developments of combining logic and probability in psychology. A study on conditional inferences illustrates our approach. First steps towards a process model of conditional inferences conclude the paper. Key words: human reasoning, conditional, probability logic, argument forms To empirically investigate human deductive inference one needs a description of what deductive inference is all about. Such a description specifies what the human mind should compute, which conclusions should be considered rational and which ones not. From Aristotle until the end of the twentieth century, classical logic was the standard reference in psychology. The emerging logical pluralism and the many new paradigms developed in computer science cast doubts upon the general appropriateness of classical logic as the standard frame in the psychology of thinking and reasoning. Recently, a strong trend in psychology emerged to consider probability to be relevant even in tasks in which uncertainties are not explicitly mentioned. The present contribution takes probability logic based on the coherence ap- proach of subjective probability as the basic reference theory. It gives a brief overview of the recent developments of combining logic and probability to build normative and descriptive models of human deductive reasoning. It ex- plains the reasons why we think that the coherence approach offers advantages for psychological model building. We also describe results of our own experi- mental studies. ⋆ Supported by the Austrian Research Fonds, FWF (project P20209). Email addresses:
[email protected],
[email protected](Niki Pfeifer and Gernot D. Kleiter). Preprint submitted to Elsevier 30 October 2007 Coherence is a key concept in subjective probability theory. In the betting in- terpretation, coherence guarantees the avoidance of sure losses (often called a “Dutch book”). From a psychological perspective, the coherence approach pro- vides several advantages. Most importantly, the coherence approach is based on the subjective interpretation of probabilities. Subjective probabilities are degrees of belief and are conceived as coherent descriptions of incomplete knowledge states. While human reasoning may be more or less coherent, it in any case involves degrees of belief and descriptions of incomplete knowledge states. It would be an unwise research strategy to take a reference theory that is far away from cognitive science, for instance, an approach to probability developed for thermodynamics or quantum physics. The coherence approach to probability offers conceptions that are not avail- able in other approaches. Examples are the renouncement of full algebras of events, the special conception of conditional probability, the treatment of sin- gle events, the expression of imprecise probabilities, the closeness to Bayesian statistics, and, last but not least, the availability of probability semantics for logical systems such as system p. There are at least two scientific communities in which coherence is basic. The “Italian community” and the “imprecise probability community”. De Finetti [9] is the root of both. Much of the recent work on the coherence approach is summarized in [6,22]. The standard reference of imprecise probability is [38]. In recent years the Italian community has connected its approach to possi- bility theory and other systems of uncertainty representation. The imprecise approach has been extended to statistical modeling, conditional independence models, and several other fields of uncertainty processing. 1 Conditional Recently, several psychologists have followed the proposal of Adams [1,2], Pop- per [34], R´enyi [37], and others and interpret the indicative conditional if H, then E as a conditional event, E|H (with the according probability P (E|H)), and not as a material conditional, H ⊃ E (with the according probability P (H ⊃ E)). The “mental model” theory may be an exception [19,5]. Interpreting the probability of the if–then as the probability of the material implication leads to the paradoxes of the material implication. This is not the case if the if–then is interpreted as a conditional event. For instance, the paradoxical argument form ¬H ∴ if H, then E can be probabilistically interpreted as P (¬H) = x ∴ P (H ⊃ E) ∈ [x, 1], where 0 ≤ x ≤ 1. Here the conclusion is constrained by the premise. If the if–then is interpreted as a conditional event, however, only P (E|H) ∈ [0, 1] follows. The conclusion is not 2 constrained by the premise. This is an advantage of the conditional probability interpretation, since the premises of a counterintuitive argument form should provide no information about its conclusion. Moreover, the probability values of P (H ⊃ E) and P (E|H) can differ substantially. Dorn [10] considers a compelling example. Let H be the next throw of the die will come up a 5, and let E be the next throw of the die will come up an even number. For a fair die P (H) = 1/6 and P (E) = 1/2. Then, P (H ⊃ E) = 5/6, whereas P (E|H) = 0. Introducing a conditional as a conditional event is special in the coherence approach. First, while usually—for example in the Kolmogorov approach— conditional probabilities are defined by absolute probabilities, in the coherence approach conditional events are basic. “In classical approaches, a conditional probability P (E|H) is not introduced as a direct notion, and so there is no meaning given to E|H itself” [8]. If conditional probabilities are “defined” by absolute probabilities, then there is no conditional entity per se, and also no non-probabilistic if–then. A conditional event is not an ordered pair (E, H), with H 6= ∅, where ∅ is the impossible event [6, p.63]. Assigning probability values directly to the conditional event is psychologically highly plausible: a person does not need to know and process the “joint” and “marginal” probabil- ities to come up with a conditional probability assessment (where “conditional = joint / marginal”). Second, as in the coherence approach conditional events are basic, we may reflect upon the behavior of their truth values. They require special attention. What are the truth values of a conditional event E|H? If H is true the answer is straightforward, 1 : if E = 1 and H = 1 |E | H | = 0 : if E = 0 and H = 1. Here 0 and 1 are the indicator values corresponding to the truth values false and true. What is the truth value of E|H when H is false? For this case de Finetti proposed a third truth value “undetermined”. A similar proposal was made by Ramsey in 1929 [36, footnote, p.155]. 1 “If two people are arguing ‘If p will q?’ and are both in doubt as to p, they are adding p hypothetically to their stock of knowledge and arguing on that basis about q; . . . We can say they are fixing their degrees of belief in q given 1 In 1926 Ramsey already introduced “the degree of belief in p given q” [35, p.82]. He noted that “this does not mean the degree of belief in ‘If p then q’ [material implication], or that in ‘p entails q’ ”. This is a misprint we found in several reprints of the article; “p” should be replaced by “q”, and vice versa. 3 p. If p turns out false, these degrees of belief are rendered void.” In terms of bets, one neither wins nor loses if the conditioning event turns out to be false. The bet is annulled and the ticket prize is payed back. A consequence of such a conception is that conditioning cannot be expressed by operators like negation, conjunction, and disjunction. There is no logical operator of conditioning [17]. This is a fundamental property that distinguishes | from ⊃. The vertical stroke operator, |, is not truth-functional, while the material implication, ⊃, is truth-functional. When conditional events are considered to be primitive, then probability ax- ioms should be introduced for conditional events. An elegant method to in- troduce and justify axioms of conditional events was proposed by Coletti and Scozzafava [6,8]. We first give the axioms and then their justification. Definition 1 (Conditional probability) Let C = G × B0 be a set of con- ditional events {E|H} such that G is a Boolean algebra and B ⊆ G is closed with respect to (finite) logical sums, with B0 = B \ {∅}. A function P : C 7→ [0, 1] is a conditional probability iff the following three axioms are satisfied A1 P (H|H) = 1, for every H ∈ B0 , or (equivalently) P (E|H) = P (E∧H|H), A2 P (·|H) is a (finitely additive) probability on G for any given H ∈ B0 , A3 P (E ∧ A|H) = P (E|H)P (A|E ∧ H) for any A, E ∈ G, H, E ∧ H ∈ B0 . In the present context we interpret the “E|H” as if H, then E . Axiom A2 specifies that, for a fixed antecedent, the probabilities of the consequents follow the rules of finite additive probability. Axiom A3 is a “multiplicative chain” rule for conjunctions. To define the “truth-value” T (E|H) of a conditional event E|H, Coletti and Scozzafava consider the function 1 if E ∧ H = 1 · IE∧H + 0 · I¬E∧H + T (E|H) = if ¬E ∧ H (1) 0 + p(E|H) · I¬H . p(E|H) if ¬H I denotes the indicator of the according event. The three values 1, 0, and p(E|H) correspond to “win”, “lose”, “get money back”, respectively. The sum of the three terms is thus a random variable, 3 X X= xk IEk , (2) k=1 where x1 = 1, x2 = 0, x3 = p(E|H), and E1 = E ∧ H, E2 = ¬E ∧ H, and 4 E3 = ¬H. Let X and Y be two such three-term random variables. In the same way as two events can be combined by a logical operator to obtain a third event, say, A⊕B = C, in the same way the two random variables, say X and Y , can be combined by a numerical operator + to obtain a third random variable, X +Y = Z. What happens when two three-term random variables are added to obtain a third one? When done without specific constraints the result does not remain in the family of numbers that represents the set of conditional events. The operation may lead to a number that does not represent any event. When we consider, however, only conditional events with a fixed conditioning event and when we further consider only mutually exclusive conditioned events, E ∧ A = ∅ (so that also E ∧ A ∧ H = ∅), the function (1) is additive. In the domain of conditional events this corresponds to the disjunction of E|H and A|H. The disjunction operator “∨” for E and A given H thus corresponds to the addition operator “+” for the corresponding random variables. The same can be done for the conjunction operator “∧” and the (then corresponding) multiplication operator “×”, for events such as in axiom A3. There is a connection between coherent conditional probability and possibil- ity distributions [4,7]. A possibility distribution may be conceived as a stan- dardized likelihood. The likelihood is a function of the conditioning events, P (E|Hi), where Hi is the variable. Likelihoods are not probabilities. Opera- tors on possibilities typically involve maxima for disjunctions, Π(A ∨ B) = max{Π(A), Π(B)}, and t-norms for conjunctions. Sometimes human subjects seem to confuse the “direction” of conditioning (e.g., in the well known Linda task). Standardized likelihoods might be candidates to model such cases in a “rational” way. Moreover, human max or min responses are hard to distin- guish from superficial “matching” responses. In psychology “matching” means to restate numbers or other material already contained in the description of a problem, as the “solution” of the problem. 2 Properties of probabilistic argument forms The lower and upper probabilities of elementary arguments are obtained by the method of cases together with some algebra, those of complex arguments by linear programming. An alternative method is used in the “Check Coherence” software [3]. Here are two examples of elementary arguments. Example 1 (modus ponens) The non-probabilistic modus ponens infers B from the set of premises {(if A, then B), A}. In the probabilistic version, the probabilities P (B|A) = y and P (A) = x are given, P (B) is sought. By the theorem of total probability we have P (B) = P (A)P (B|A) + P (¬A)P (B|¬A). The lower probability of B, P (B) = z ′ , is obtained by (case 1) assuming P (B|¬A) = 0, so that z ′ = xy. The upper probability is obtained by (case 2) 5 assuming P (B|¬A) = 1, so that z ′′ = xy + (1 − x) 1 = 1 − x(1 − y). Example 2 (modus tollens) The non-probabilistic modus tollens infers ¬A from {(if A, then B), ¬B}. Let P (A) = x, P (B|A) = y, P (B) = z, and P (B|¬A) = q; in the probabilistic version y and 1 − z are given, (1 − x)′ and (1−x)′′ are sought. By the theorem of total probability we have z = xy+(1−x)q. Solving for 1 − x we get 1 − x = 1 − (z − q)/(y − q) = (y − z)/(y − q). We distinguish three cases. (a) q = z leads to the upper probability (1 − x)′′ = 1, (b) q = 0 leads to the lower probability (1 − x)′ = 1 − z/y, and (c) q = 1 leads to the lower probability (1 − x)′ = (z − y)/(1 − y). The lower probability is thus max{1 − z/y, (z − y)/(1 − y)}. The numerical solutions of complex problems may be found by linear program- ming. Let us consider an inference problem with n variables and m premises. The probability vector of the premises is denoted by p = (p1 , p2 , . . . , pm ). We build a coefficient matrix V with m+1 rows, one row for each premise and one additional row containing 1s only. Each column is associated with one of the combinatorially possible 0/1 patterns of the n variables. In the case of logical independence there are r = 2n such patterns. In the case of logical dependence there are fewer cases (or constituents; for how to obtain the constituents see, e. g., [15,14]). The vij values, i = 1, . . . , m and j = 1, . . . , r, are equal to the (generalized) indicator values that premise i obtains under the truth values of the constituent j. The values are either 0, 1, or a conditional probabil- ity (for conditional events with negated conditioning events). The values are determined according to equation (1). The matrix V together with the vector p builds a system of m + 1 linear equations with r unknowns. v11 π1 + v12 π2 + · · · + v1r πr = p1 v21 π1 + v22 π2 + · · · + v2r πr = p2 ... + ... + ... + ... = ... (3) vm1 π1 + vm2 π2 + · · · + vmr πr = pm π1 + π2 + · · · + πr = 1 The (π1 , π2 , . . . , πr ) are the unknown probabilities of the constituents. The sum of these probabilities is 1. If the number of premises is less than r − 1, then the linear system has no exact solution. Next we introduce the conclusion. It is represented by the objective function w1 π1 + w2 π2 + · · · + wr πr , (4) 6 where the coefficients w1 , w2 , . . . , wr are determined by equation (1). Example 3 (modus tollens) The upper part of Table 2 shows the four constituents for two variables. The lower part gives the two premises of the modus tollens and the coefficients of the linear system. The lower and upper probabilities of the conclusion ¬A with the coefficients (1, 1, 0, 0) are the min- imum and maximum, respectively, of the objective function p3 = 1 · π1 + 1 · π2 . With the help of linear programming the lower and the upper values of the function are determined. If the probabilities of the premises are given in the form of intervals only, then the lower and upper probabilities of the conclusion are found by fractional programming, which requires several linear program- ming steps in succession [22]. A 0 0 1 1 B 0 1 0 1 B|A p1 p1 0 1 p1 ¬B 1 0 1 0 p2 1 1 1 1 1 Table 1 modus tollens. The upper part contains the constituents, the lower part the coefficient matrix V. An important property of arguments is the presence or absence of various kinds of logical and functional dependencies in sets of events. We first consider unconditional events. Logical independence is defined as follows. Definition 2 (Logical independence) Let {E1 , . . . , Em } be a set of m unconditional events. If all 2m atoms are possible conjunctions, then the set of events is logically independent. Otherwise they are dependent. We note that logical independence and dependence refer to a set of events. The mutual independence of two events is a special case. We next consider the case of conditional events. To define logical independence for conditional events we follow [15]. Definition 3 (Logical independence of conditional events) A set of m conditional events is logically independent, if the number of constituents is 3m . The constituents are constructed by the combinations of the (Ei ∧Hi )∨(¬Ei ∧ Hi ) ∨ ¬Hi , i = 1, . . . , m. For details we refer to [15,14]. 7 We next consider linear dependence/independence. Let Vm+2 be the coefficient matrix of the premises together with the conclusion. Theorem 1 (Linear dependence) If the rank r(Vm + 1) = k and the rank r(Vm+2 ) = k + 1, then the premises and the conclusion are linearly independent. If r(Vm +1) = r(Vm+2 ), then the conclusion is linearly dependent on the premises. The rank determines the number of dimensions of a linear space. Closely related to the dependence/independence properties is de Finetti’s Fun- damental Theorem [9, p.112]. Theorem 2 (Fundamental Theorem) Given the probabilities P (E1 ), P (E2 ), . . . , P (Em ) of a finite number of events, the probability of a further event Em+1 , precise if Em+1 is linearly dependent on {E1 , E2 , . . . , Em } , P (Em+1 ) is ∈ [0, 1] if Em+1 is logically independent on {E1 , E2 , . . . , Em } , ∈ [p′ , p′′ ] if Em+1 is logically dependent on {E1 , E2 , . . . , Em } , where p′ and p′′ are lower and upper probabilities. The first case (linear dependence) is a special case of logical dependence in which p′ = p′′ . Practically all theorems of elementary probability theory belong to the first case. In the second case we say an argument is probabilistically non- informative. As two corollaries we obtain (compare also Figure 1): Corollary 1 (Partial independence from below) If the set of atoms in which the indicator of the conclusion is 0 is logically independent, then p′ = 0. Corollary 2 (Partial independence from above) If the set of atoms in which the indicator of the conclusion is 1 is logically independent, then p′′ = 1. One of the best known principles in probability logic is Adams’ concept of p-validity [1,2]: Definition 4 (Adams’ Hauptsatz) The uncertainty of the conclusion of a valid inference cannot exceed the sum of the uncertainties of its premises. The uncertainty u(A) is defined by the 1-complement of the corresponding probability, u(A) = 1 − P (A). It may be shown that for unconditional events 8 Adams’ Hauptsatz becomes Corollary 3 Let E1 , E2 , . . . , Em be a set of unconditional events with prob- abilities p1 , p2 , . . . , pm . The probability of the conclusion of a valid inference cannot be less than m ( !) X P (Em+1 ) = max 0, 1 − m − pi . i=1 The corollary shows that the lower probability is not sensitive (i) to the specific logical form of the premises and (ii) to the order of the probabilities p1 , . . . , pm . The two properties hold for unconditional events only and reflect the fact that in this case the events are truth functional. Only the lower bounds of the conclusions of those arguments that contain conditional events can be sensitive to the structure of the premises and to the specific pattern of the probability assessment. Algebraically this is associated with the matrix V which, in the case of conditionals, does not only contain 0s and 1s, but a pattern of real- valued probabilities. Material implication is truth functional. If human subjects would interpret if– then as material implication their probability responses in p-valid arguments should be insensitive to the logical form of the premises and to permutations of the probabilities of the premises. There is, however, strong evidence that human subjects are sensitive to structure and assignment. We consider this as one of the strongest arguments against the interpretation of the if–then as a material implication. 3 Combining logic and probability in psychology Recent probabilistic approaches to human deductive reasoning may be clas- sified according to the interpretation of the if–then and according to the relation between the premise(s) and the conclusion. One of the most influen- tial psychological theories of human reasoning is the mental model theory [19]. The theory was extended to human probabilistic reasoning [20,16]. The core meaning of the uncertain if–then is postulated to correspond to the probabil- ity of the material implication, P (H ⊃ E). In recent studies on the meaning of the if–then [24,27,13,28], the participants had either to infer the probabil- ities of the four truth table cases (E ∧ H, E ∧ ¬H, ¬E ∧ H, and ¬E ∧ ¬H) from the probability of the if–then (i), or the participants had to infer the probability of the if–then from the probabilities of the four truth table cases (ii). The interpretation of the P (if H, then E) is then easily computed, as the probabilities of all truth table cases are given. The main result was that most 9 subjects interpret the P (if H, then E) as a conditional probability, P (E|H), and not as a probability of the material implication, P (H ⊃ E). There are at least two ways in which probabilities may enter argument forms. The relation between the premises and the conclusion can be probabilistic (i), or, the inference relation is deductive but some or all premises and the conclu- sion may be probabilistically valuated (ii). Oaksford, Chater, and Larkin [26] (see also [25]) proposed that the endorsement rate of the conditional inferences is directly proportional to the conditional probability of the conclusion given the categorical premise. The modus ponens, e.g., is evaluated by P (E|H), where “H” denotes the categorical premise, and “E” denotes the conclusion. Liu [23] proposed to conditionalize on both, the categorical premise, and the conditional premise. Thus, in both approaches [26,25,23] the relation between the premises and the conclusion is probabilistic and not deductive. In probability logic a probability is attached to some or all premises and the probability of the conclusion is derived by mathematical methods. Thus, the relation between the premises and the conclusion is deductive and not prob- abilistic. In the coherence approach we consider a set of premises E1 , . . . , Em and a conclusion Em+1 and assume that there exists a coherent probability as- sessment p1 , . . . , pm for the premises. The probability of the conclusion, pm+1 , is derived deductively from the premises. The events (propositions) may be conditional or unconditional. Our own work follows this approach. Table 2 lists probabilistic versions of well known argument forms. We investi- gated empirically [29,30,32] a coherence based probabilistic semantics [14] of the basic nonmonotonic reasoning system p [21]. The rules of system p are p-valid [1,2]. A necessary condition for p-validity is that the non-probabilistic versions of the rules are logically valid. Logical validity, however, does not guarantee p-validity. transitivity, e.g., is logically valid but not p-valid. We call an argument form “probabilistically informative” if the coherent probabil- ity interval of its conclusion is not necessarily equal to the unit interval [0, 1]. An inference rule is probabilistically non-informative, if the assignment of the unit interval to its conclusion is necessarily coherent. All p-valid arguments are probabilistically informative (see the classification in Figure 1). We empirically investigated rules of system p and rules that clearly violate system p (see Table 2). We also investigated the probabilistic versions of clas- sical argument forms, like the modus ponens. In the psychology of reasoning classical argument forms were often investigated experimentally. It is inter- esting to compare results of the probabilistic and the more traditional non- probabilistic argument forms. We give a brief overview of our investigations on system p and describe an example study in more detail below. 10 Name Probabilistic version of the argument form v l u p left logical equivalence⋆ |=(E1 ≡ E2 ), P (E3 |E1 ) = x ∴ P (E3 |E2 ) = x y y y y proof by cases† P (E2 |E1 ) = x, P (E2 |¬E1 ) = y ∴ P (E2 ) ∈ [min{x, y}, max{x, y}] y y y y right weakening⋆ P (E1 |E3 ) = x, |=(E1 ⊃ E2 ) ∴ P (E2 |E3 ) ∈ [x, 1] y y n y modus ponens† P (E2 |E1 ) = x, P (E1 ) = y ∴ P (E2 ) ∈ [xy, 1 − y + xy] y y y y cut⋆ P (E2 |E1 ∧ E3 ) = x, P (E1 |E3 ) = y ∴ P (E2 |E3 ) ∈ [xy, 1 − y + xy] y y y y and† P (E2 |E1 ) = x, P (E3 |E1 ) = y ∴ P (E2 ∧ E3 |E1 ) ∈ [max{0, x + y − 1}, min{x, y}] y y y y modus tollens† P (E2 |E1 ) = x, P (¬E2 ) = y ∴ P (¬E1 ) ∈ [max{(1−x−y)/(1−x), (x+y−1)/x}, 1] y y n y cautious monotonicity⋆ P (E2 |E1 ) = x, P (E3 |E1 ) = y ∴ P (E3 |E1 ∧ E2 ) ∈ [max{0, (x+y−1)/x}, min{y/x, 1}] y y y y or⋆ P (E3 |E1 ) = x, P (E3 |E2 ) = y ∴ P (E3 |E1 ∨E2 ) ∈ [xy/(x+y−xy), (x+y−2xy)/(1−xy)] y y y y denying the antecedent P (E2 |E1 ) = x, P (¬E1 ) = y ∴ P (¬E2 ) ∈ [(1 − x)(1 − y), 1 − x(1 − y)] n y y n affirming the consequent P (E2 |E1 ) = x, P (E2 ) = y ∴ P (E1 ) ∈ [0, min{y/x, (1 − y)/(1 − x)}] n n y n transitivity P (E2 |E1 ) = x, P (E3 |E2 ) = y ∴ P (E3 |E1 ) ∈ [0, 1] y n n n contraposition P (E2 |E1 ) = x ∴ P (¬E1 |¬E2 ) ∈ [0, 1]; P (¬E1 |¬E2 ) = x ∴ P (E2 |E1 ) ∈ [0, 1] y n n n monotonicity P (E3 |E1 ) = x ∴ P (E3 |E1 ∧ E2 ) ∈ [0, 1] y n n n Table 2 Probability logical argument forms. The logical operators are defined as usual, “|=” denotes classical logical truth. The axioms of system p are marked by “⋆ ”, derived rules are marked by “† ”. Derivations of the probability propagation rules of system p are in [14], for the other argument forms see [31]. “v” denotes logical validity of the non-probabilistic version of the argument form. “l” and “u” denote whether the lower or the upper probability bound of the conclusion, respectively, is constrained by the probabilities of the premises. “p” denotes whether the argument form is p-valid. We translated the nonmonotonic inference rules of system p into cover-stories. The cover-stories contained the probabilities of the premises. The task of the participants was to infer the probability(-interval) of the conclusions. In all experiments we payed special attention to create an atmosphere of reasoning and to avoid quick guessing. The participants were students of our university. They were tested individually in a quiet room in the department. They were asked to take enough time. Practically all responses in the left logical equivalence (see Table 2) and in the right weakening conditions were coherent. For the and, cautious cut, and the cautious monotonicity tasks, more than half of the inter- val responses were coherent. Moreover, we investigated argument forms that clearly violate system p, namely monotonicity, contraposition, and tran- sitivity. These argument forms are probabilistically non-informative. Except for transitivity, most participants understood that these argument forms are probabilistically non-informative and they inferred wide intervals. We explain the results in the transitivity tasks by conversational implicatures. Adams [1] stressed the probabilistic invalidity of the transitivity and suggested to interpret transitivity in common sense arguments as cut. If a speaker first utters a premise of the form if E1 , then E2 and then utters as the second premise if E2 , then E3 , the speaker actually means by the second premise if E1 and E2 , then E3 . The speaker does not mention “E1 and” to the ad- dressat because E1 and is already conversationally implied and “clear” from the context. Thus, we analyzed the data of the transitivity tasks as cut and observed analogue patterns as in the cut tasks. Of special interest are the tasks in which all premises are certain. This is the case in those tasks in which the probabilities of the premises are equal to 1. These tasks serve as “control conditions” as they are comparable to the re- spective non-probabilistic argument forms. In the tasks with certain premises, practically all participants endorse the system p rules. The high endorsement rates are comparable to the endorsement rates of the non-probabilistic version of the modus ponens (89–100%; [11]). In the monotonicity task with certain premises the interval responses are large, which means that many participants understand the probabilistic non-informativeness of the monotonicity argu- ment form even in this special condition. In the case of transitivity the mean lower bounds are very high. As discussed above, participants might interpret the transitivity tasks as cut tasks. As an example, we describe a study on the probabilistic versions of the ar- gument form modus ponens (mp), modus tollens (mt), denying the an- tecedent (da), and affirming the consequent (ac). The non-probabilistic mp and mt are logically valid. The non-probabilistic da and ac are not logically valid. The probabilistic mp and mt are p-valid. The probabilistic da and ac are not p-valid. We were especially interested to compare the affirmative and 12 [l, 1] [0, u] logically [0, 1] [l, u] probabilistically valid informative l=u [0, 1] Fig. 1. Classification of probabilistic argument forms. l and u denote whether the lower or upper probability bound of the conclusion, respectively, is constrained by the premise(s). The circle on the left contains argument forms that are logically valid in their non-probabilistic version. The intersection of the bold circles contains the p-valid argument forms. All regions are non-empty, see Table 2 for examples. the negated versions of these argument forms (i.e., negated conclusions). The four affirmative argument forms (mp, da, mt, ac) and their negated versions (nmp, nda, nmt, nac) are shown in Table 6. The non-probabilistic versions of these argument forms were extensively investigated empirically [12,11]. The non-probabilistic mp is actually endorsed by 89-100%, the mt by 41-81%, the da by 17-73%, and the non-probabilistic ac is endorsed by 23-75% of the participants [11]. One hundred and twenty students participated in our study on the probabilis- tic versions. Thirty participants were assigned to each of the four conditions mp and nmp, da and nda, mt and nmt, and ac and nac. Each participant solved three affirmative and three negated arguments. As an example, a mp task had the following form: Imagine the following situation. Around Christmas time a certain ski-resort is very busy. This region is very popular among sportsmen, like skiers, snow-boarders, and sledge-riders. Every hour a cable-car brings the sportsmen to the top. About this cable-car we know: Exactly 70% of the skiers wear red caps. Exactly 90% of the sportsmen are skiers. Imagine all the sportsmen in this cable car. How many of these sportsmen wear a red cap? Participants could respond either by a point value or by two interval values. All tasks had a similar structure. Table 3 lists the probabilities presented in the premises, the normative lower and upper bounds, and the participants’ mean lower and upper bound responses for the tasks. In the mp tasks with certain premises (100% in both premises) all thirty par- ticipants solved the task correctly and responded “100%”. Likewise, all par- ticipants solved the negated version of the modus ponens (nmp) correctly 13 P1 P2 clb cub lbr ubr clb cub lbr ubr modus ponens negated modus ponens 1 1 1 1 1 1 .00 .00 .00 .00 .7 .9 .63 .73 .62 .69 .27 .37 .35 .42 .7 .5 .35 .85 .43 .55 .15 .65 .41 .54 denying the antecedent neg. denying the antecedent 1 1 .00 1 .37 .85 .00 1 .01 .53 .7 .2 .20 .44 .19 .42 .56 .80 .52 .76 .7 .5 .15 .65 .25 .59 .35 .85 .33 .65 modus tollens negated modus tollens 1 1 1 1 .73 .82 .00 .00 .18 .33 .7 .9 .86 1 .46 .72 .00 .14 .20 .41 .7 .5 .29 1 .36 .66 .00 .71 .27 .57 affirming the consequent neg. affirming the consequent 1 1 .00 1 .36 .97 .00 1 .04 .64 .7 .9 .00 .33 .43 .86 .67 1 .10 .48 .7 .5 .00 .71 .34 .77 .29 1 .13 .56 Table 3 Mean lower (lbr) and mean upper bound responses (ubr). P1 and P2 denote the probabilities presented in the premises. clb and cub denote the normative/coherent bounds. The data of the upper half of the table are taken from [33]. and responded “0%”. This indicates two things. First, the participants are perfect in the “certain mp” and “certain nmp”. Second, the reliability of our experimental conditions is high. The results agree with the literature. Human subjects are perfectly competent to make mp inferences. The relation between the number of responses falling into the normatively correct interval and the size of the normative interval is used as a measure of the agreement of the responses and the normative values. We use a simple χ2 value to express the agreement, χ2 = (f − e)2 /e, with f = number of par- ticipants inferring coherent values, and e = expected number of participants assuming a random response generator. Let the normative interval be [l, u]. In step one the random number generator selects a lower response rl greater than l with probability 1 − l. In step two it selects a number greater than rl and less than u with probability (u − r1 )/(1 − u). For our purposes it is sufficient to approximate u − r1 roughly by u − l. Combining step one and step two by multiplying the two probabilities simplifies to u−l so that e = N ·(u−l) where N denotes the number of participants in an experimental condition. Table 4 reports the χ2 values for the various tasks for the probabilistic premises. The by far best agreement with the coherent intervals is obtained for the mp 14 P1 .70 .70 .70 .70 .70 .70 .70 .70 P2 .90 .50 .90 .50 .20 .50 .20 .50 mp nmp da nda χ2 120.33 13.07 85.33 9.60 10.76 .60 6.42 .60 mt nmt ac nac χ2 1.72 (-).01 13.88 .98 (-)10.00 (-).92 (-)8.10 (-)11.11 Table 4 Deviance of the number of observed responses falling into the coherent intervals from the expect number assuming a random interval generator. High χ2 values indicate more than expected coherent responses. If there are too few cases in the coherent interval the χ2 values are marked by (−). (P1 , P2 ) (1, 1) (.7, .9) (.7, .5) (.7, .2) (P1 , P2 ) (1, 1) (.7, .9) (.7, .5) mp 100 53 50 mt 67 43 30 da 67 30 0 ac 77 23 27 Table 5 Percentages of participants satisfying the conjugacy principle in the modus ponens, denying the antecedent, modus tollens, and the affirming the conse- quent conditions (±2% tolerance, n = 30 in each condition). and the nmp. There is a significant deviance from normative intervals in the ac and nac. Data shows that in the first ac task the participants give incoherent responses with too high values and in the nac with too low values. Matching is one possible explanation, but omitting a negation step in the solution process (see below) is an alternative explanation. In the da tasks with “100%” in both premises, fourteen of the thirty partic- ipants responded correctly with the unit interval or an interval with a lower boundary very close to zero, [≤ 1, 100]%. Practically half of the participants understood that only a non-informative interval can be inferred if each premise is certain. All participants inferred a probability (interval) of a conclusion C, P (C) ∈ [zC′ , zC′′ ], and the probability of the associated negated conclusion, P (¬C) ∈ ′ ′′ [z¬C , z¬C ]. To test the conjugacy principle of the interval responses, we checked for each participant whether both zC′ + z¬C ′′ = 1 and z¬C′ + zC′′ = 1 are satisfied. Table 5 shows the number of participants that exactly satisfy the conjugacy principle in the four tasks and their negated forms. In the mp with 70% and 90% in the premises, for example, 16 of the thirty participants satisfied both conjugacy conditions. Participants with perfect conjugacy show a remarkable sensitivity with respect to the 1-complements in the context of negation. 15 x A q ¬A y B ¬B z Fig. 2. Probability diagram for the basic argument forms of Table 6. P (A) = x, P (¬A) = 1 − x, P (B|A) = y, P (¬B|A) = 1 − y, P (B|¬A) = q ∈ [0, 1], P (¬B|¬A) = 1 − q, P (B) = z, P (¬B) = 1 − z. z−y x (1 − x)′ = max{1 − yz , 1−y } x′ = 0 A A ¬A A y y y B B ¬B B z ′ = xy 1−z z Fig. 3. Left mp, middle mt, right ac. P (A) = x, P (¬A) = 1 − x, P (B|A) = y, P (¬B|A) = 1 − y, P (B|¬A) = q ∈ [0, 1], P (¬B|¬A) = 1 − q, P (B) = z, P (¬B) = 1 − z. 4 First steps towards a process model of conditional inferences Evans [12] gives two task features that explain several of the effects observed in classical argument forms, directionality and negativity. The modus ponens is a forward task. The modus tollens is a backward task. The mp is a forward argument because it requires an inference from the antecedent to the conse- quent. The mt is a backward argument because it requires an inverse inference, from the consequent to the antecedent. Directionality is best illustrated by a propositional graph. A propositional graph is a directed graph. The verteces represent propositions and the edges between two verteces represent condi- tionals. We attach probabilities to the edges. The absolute probability of a proposition is represented by an arc without a parent. Figure 2 shows a diagram for two affirmative propositions and their nega- tions. The four possible if–then conditionals are represented by the four arcs y 1−y q 1−q A −→ B, A −→ ¬B,¬A −→ B, and ¬A −→ ¬B. x, y, z, and q denote the probabilities P (A), P (B|A), P (B), and P (B|¬A), respectively. Dashed arrows are used when the absolute or the conditional probabilities refer to negated propositions. The propositional graph represents the problem space for a class of conditional inference tasks. With such a diagram the premises of the mp are x y z represented by −→ A −→ B and the conclusion is represented by −→ B. The inference consists in the removal of the vertex A. Figure 3 shows the diagrams for the premises of the mp, mt, and the ac. 16 P1 P2 ∴ C p′ p′′ mp if A, then B A B xy 1 − x(1 − y) nmp if A, then B A ¬B x(1 − y) 1 − xy da if A, then B ¬A ¬B (1 − x)(1 − y) 1 − (1 − x)y nda if A, then B ¬A B (1 − x)y 1 − (1 − x)(1 − y) n o z z−y mt if A, then B ¬B ¬A max 1− y , 1−y 1 n o nmt if A, then B ¬B A 0 1−max 1− zy , z−y 1−y n o ac if A, then B B A 0 min 1 − yz , z−y 1−y n o nac if A, then B B ¬A 1−min 1− yz , z−y 1−y 1 Table 6 mp and negated mp (nmp), da and negated da (nda), mt and negated mt (nmt), ac and negated ac (nac), the two premises P1 and P2, the conclusion C, P (B|A) = y, P (A) = x and P (B) = z, P (¬A) = 1 − x, P (¬B) = 1 − z. Compare Figure 2. Non-probabilistic and probabilistic studies have shown that the mt is more difficult than the mp. How can differences like these be explained with the propositional graphs? We observed that the participants in our experiments were clearly better in lower than upper probability responses. Normatively the lower probability of the conclusion of the mp, z ′ , is the product of the two premise probabili- ties, P (A)P (B|A) (see Figure 2 and Figure 3). A process model assumes that human subjects understand that in mp the conclusion is, in any case, less probable than any of its premises and that the lower probability is obtained by taking 100x % of y or 100y % of x. In multiplicative forward chaining, current running results are obtained from iteratively taking a proportion of the last running results. Such an operation is easy to perform intuitively with degrees of belief. Backward processing is sometimes non-informative. In the ac the lower probability is zero. Such results cannot be obtained by “cascaded inference” as in forward inference, see Figure 3. We further suppose that negations make inferences difficult so that human subjects prefer to think in terms of affirmative propositions. In an inference graph this requires taking a “detour” and switching from negations to affirma- tions and back. It also requires switching 1-complements of probabilities and lower and upper probabilities. This involves more cognitive load. It is easy to lose track on such a solution path or to forget to switch back a 1-complement. Normatively the upper probability is obtained by the conjugacy principle [38]. The upper probability of an event E is 1 minus the probability of the nega- tion of E, p′′ (E) = 1 − p′ (¬E). For the upper probability of the mp we need the probability of ¬B, the negation of the conclusion. This is obtained by 17 taking from the product x(1 − y) the 1-complement, z ′′ = 1 − x(1 − y). A process model assumes that these steps are also involved in an analog form in human reasoning. It predicts that the upper probability of the mp is more difficult than the lower one because for its solution more steps are required. The model assumes a strong preference for affirmative propositions. In many investigations and in different domains it was observed that negated infor- mation requires additional processing efforts, takes more time, leads to more errors etc. than affirmative information [18]. We distinguish two families of elementary argument forms, the mp and the mt family. Each one has four members. They are obtained by affirming or negating the categorical antecedent or the conclusion. Table 6 gives the lower and upper probabilities for each of the 2 × 4 argument forms. The conjugacy principle is reflected in the 1-complements of the diagonal entries of the successive argument pairs. The upper probability of the mp, e.g., is equal to 1 minus the lower probability of the nmp. Note the symmetries in the mt family concerning the smaller/greater of two ratios. The members of the mp family require forward processing, those of the mt family backward processing. Inferences in the mp family involve multiplication, inferences in the mt family division. In addition, inferences in the mt family require min / max-decisions. The number and the kind of steps may be used to estimate the difficulty of the argument forms. Obviously the lower probability of mp is especially easy as it requires only one step. mt requires most steps. In the nmp the lower probability of ¬B results from the product of the x(1 − y). The result is obtained by multiplicative chaining (Figure 3), but this time the 1-complement of the given y value is required. The upper probability for the nmp requires three steps: (i) taking the event-complement of ¬B, which is B, and which is given y, (ii) multiplicative chaining, and (iii) taking 1- complement. By bringing probability, logic and psychology together we have tried to im- prove the understanding of human reasoning. We have approached the area on several routes simultaneously, including the selection of an appropriate normative framework, running experiments, and modeling of cognitive repre- sentations and reasoning processes. The investigation of human reasoning is a highly interdisciplinary endeavor. References [1] E. W. Adams. The logic of conditionals. Reidel, Dordrecht, 1975. [2] E. W. Adams. A Primer of Probability Logic. CSLI, Stanford, 1998. 18 [3] M. Baioletti, A. Capotorti, L. Galli, S. Tognoloni, F. Rossi, and B. Vantaggi. CkC (Check Coherence package; version e4, June, 2007). Download: http://www.dipmat.unipg.it/~upkd/paid/software.html. [4] B. Bouchon-Meunier, G. Coletti, and C. Marsala. Conditional possibility and necessity. In B. Bouchon-Meunier, J. Guti´errez-R´ıos, L. Magdalena, and R. R. Yager, editors, Technologies for constructing intelligent systems: Tools, pages 59–72. Springer, Berlin, 2002. [5] R. M. J. Byrne. The rational imagination. How people create alternatives to reality. Bradford, MIT Press, Cambridge, MA, 2005. [6] G. Coletti and R. Scozzafava. Probabilistic logic in a coherent setting. Kluwer, Dordrecht, 2002. [7] G. Coletti and R. Scozzafava. Conditional probability, fuzzy sets, and possibility: A unifying view. Fuzzy Sets and Systems, 144:227–249, 2004. [8] G. Coletti and R. Scozzafava. Conditioning in a coherent setting: Theory and applications. Fuzzy Sets and Systems, 155:26–49, 2005. [9] B. De Finetti. Theory of probability, volume 1, 2. John Wiley & Sons, Chichester, 1974. Original work published 1970. [10] G. J. W. Dorn. Popper’s law of the excess of the probability of the conditional over the conditional probability. Conceptus, 26(67):6–61, 1992. [11] J. St. B. T. Evans, S. E. Newstead, and R. M. J. Byrne. Human Reasoning. Erlbaum, Hove, 1993. [12] J. St. B. T. Evans. The psychology of deductive reasoning. Routledge, London, 1982. [13] J. St. B. T. Evans, S. H. Handley, and D. E. Over. Conditionals and conditional probability. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29:321–355, 2003. [14] A. Gilio. Probabilistic reasoning under coherence in System P. Annals of Mathematics and Artificial Intelligence, 34:5–34, 2002. [15] A. Gilio and S. Ingrassia. Totally coherent set-valued probability assessments. Kybernetika, 34(1):3–15, 1998. [16] V. Girotto and P. N. Johnson-Laird. The probability of conditionals. Psychologia, 47:207–225, 2004. [17] I. R. Goodman, H. T. Nguyen, and E. A. Walker. Conditional inference and logic for intelligent systems. A theory of measure-free conditioning. North-Holland, Amsterdam, 1991. [18] L. R. Horn. A natural history of negation. CSLI Publications, Stanford, 2001. [19] P. N. Johnson-Laird and R. M. J. Byrne. Deduction. Erlbaum, Hillsdale, 1991. 19 [20] P. N. Johnson-Laird, V. Girotto, P. Legrenzi, and M. S. Legrenzi. Naive probability: A mental model theory of extensional reasoning. Psychological Review, 106(1):62–88, 1999. [21] S. Kraus, D. Lehmann, and M. Magidor. Nonmonotonic reasoning, preferential models and cumulative logics. Artificial Intelligence, 44:167–207, 1990. [22] F. Lad. Operational subjective statistical methods: A mathematical, philosophical, and historical introduction. Wiley, New York, 1996. [23] I.-M. Liu. Conditional reasoning and conditionalization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29(4):694–709, 2003. [24] I.-M. Liu, K.-C. Lo, and J.-T. Wu. A probabilistic interpretation of ‘If—Then’. The Quarterly Journal of Experimental Psychology, 49(A):828–844, 1996. [25] M. Oaksford and N. Chater. Bayesian rationality: The probabilistic approach to human reasoning. Oxford University Press, Oxford, 2007. [26] M. Oaksford, N. Chater, and J. Larkin. Probabilities and polarity biases in conditional inference. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26:883–899, 2000. [27] K. Oberauer and O. Wilhelm. The meaning(s) of conditionals: Conditional probabilities, mental models and personal utilities. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29:680–693, 2003. [28] D. E. Over, C. Hadjichristidis, J. S. Evans, S. J. Handley, and S. A. Sloman. The probability of causal conditionals. Cognitive Psychology, 54:62–97, 2007. [29] N. Pfeifer and G. D. Kleiter. Nonmonotonicity and human probabilistic reasoning. In Proceedings of the 6th Workshop on Uncertainty Processing, pages 221–234, Hejnice, 2003. September 24–27th , 2003. [30] N. Pfeifer and G. D. Kleiter. Coherence and nonmonotonicity in human reasoning. Synthese, 146(1-2):93–109, 2005. [31] N. Pfeifer and G. D. Kleiter. Inference in conditional probability logic. Kybernetika, 42:391–404, 2006. [32] N. Pfeifer and G. D. Kleiter. Is human reasoning about nonmonotonic conditionals probabilistically coherent? In Proceedings of the 7th Workshop on Uncertainty Processing, pages 138–150, Mikulov, 2006. September 16–20th , 2006. [33] N. Pfeifer and G. D. Kleiter. Human reasoning with imprecise probabilities: Modus ponens and Denying the antecedent. In 5th International Symposium on Imprecise Probability: Theories and Applications, pages 347–356, Prague, Czech Republic, 16 - 19 July 2007. [34] K. R. Popper. A set of independent axioms for probability. Mind, 47:275–277, 1938. 20 [35] F. P. Ramsey. Truth and probability (1926). In D. H. Mellor, editor, Philosophical Papers by F. P. Ramsey, pages 58–100. Cambridge University Press, Cambridge, 1994. [36] F. P. Ramsey. General propositions and causality (1929). In D. H. Mellor, editor, Philosophical Papers by F. P. Ramsey, pages 145–163. Cambridge University Press, Cambridge, 1994. [37] A. R´enyi. On a new axiomatic theory of probability. Acta Mathematica Academiae Scientiarum Hungaricae, 6:285–335, 1955. [38] P. Walley. Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, London, 1991. 21