P[ :C]: Elementary Theory

Draft

This is the mathematically ‘elementary’ material on ‘possible probabilities’.

Probability of Assertions

Here we considered probabilities of assertions, rather than of events, in order to avoid the difficulties that Russell raised, but following Kolmogorov‘s general scheme.

The elements are:

  • A fixed domain, a set {s}.
  • Assertions {A} over {s}, so that for each A, s, A(s) = 0 or 1.
  • Boolean operators on {A}, which is closed under them.
  • Possible probabilities {p}, probability measures (see below), with p: {A} → [0,1].

Moreover:

  • The Assertions include constants ‘0’ and ‘1’.
    (‘False’, ‘True’.)
  • The Boolean operators include ¬A, A∧B, A∨B.
    (Complement, and, or.)

The probability measures satisfy Kolmogorov-like conditions over the operators:

  1. p(‘1’)=1.
  2. If p(A∧B)=0 then p(A∨B) = p(A)+p(B).

These have the usual consequences, by analogy with Kolmogorov:

  1. If A ⇒ B then p(A) ≤ p(B).
  2. p(‘0’)=0.
  3. p(A∨B) + p(A∧B) = p(A) + p(B).
  4. p(¬A) = 1 – p(A).

Thus nothing of the power of Kolmogorov has been lost: it is just that we confine the application to assertions on a fixed domain.

Sets of Possible Probabilities

Unconditional Probabilities

Let C be some specific context. Let Pu< > denote the set of possible probability functions satisfying the above axioms.

Let Pu<A>≡ {p → p(A) | p( ) ∈ Pu< >},
the members of Pu< > restricted to A.

This is richer than just {p(A) | p( ) ∈ Pu< >}, allowing unions to be more precise.

 Operators

We can define:

 Pu<A> op Pu<B> ≡ {p → p(A) op p(B) | p( ) ∈ Pu< >},

defined where

p(A) op p(B) ∈[0,1],

so that Pu<A> op Pumaps onto possible probability values. For example, as usual, some care needs to be taken with Pu<A>/Pu, restricting the range of ‘/’. But we shall only use it where it is used in the precise theory.

Kolmogorov-like Properties

The following are analogous to the usual Kolmogorov axioms:

  1. Pu<‘1’> = ‘1’, the constant map.
  2. If Pu<A∧B>=0 then Pu<A∨B> = Pu<A>+Pu<B>.

One also has the following analogous consequences:

  1. If A ⇒ B then Pu<A> ≤ Pu<B>.
  2. Pu< ‘0’ >= ‘0’.
  3. Pu<A∨B> + Pu<A∧B> = Pu<A> + Pu<B>.
  4. Pu<¬A> + Pu<A> = ‘1’.

Conditional Probabilities

We may now define the conditional probability,

Pc<A|B> ≡ Pu<A∧B>/Pu<B>, when Pu<B> ≠ 0,

noting that:

Pu<A∧B>/Pu<B> = {p → p(A∧B) /p(B) | p( ) ∈ Pu<>} = {p → p(A|B) | p( ) ∈ Pc< >},
where Pc< | > is the usual extension to conditional probabilities of Pu< >.

Probabilities in Context

The above assumed a fixed context. We may use the notation

P< | :C>

to record the dependency on C. We then have:

  • If C1 ⇒ C2 then P< :C1> ⊆ P< :C2>.
  • P< : C1or C2>= P< :C1> ∪ P< :C2>.

(Note that the contexts have a different logic from the assertions. This context logic is often informal.)

As usual, we say that contexts C1 and C2 are ‘consistent’ whenever P< :C1> ∩ P< :C2> is non-empty.

If C1 and C2 are consistent, we generally assume some application-dependent ‘context language’ such that we can find a common context, C, such that

P< : C > ⊆ P< :C1> ∩ P< :C2>.

(Usually, this will be an equality, but this depends on the context logic.)

Notation

Straightforward

The following are also useful:

P< | > (with no :C) can be used as short for P< | :C>,
where the C is clear from the context or is the same across all expressions.

P< > ≡ P< |’1′>.

P<A|B> op P<A’|B> ≡ {p → p(A|B) op p(A’|B’) | p( ) ∈ P< | >},
where the operator op is defined and maps into [0,1].

P<A|B> rel P<A’|B’> ≡  ∀ p( ) ∈ P< | >, p(A|B) rel p(A’|B’),
where the relation rel is defined.

f<P<A|B>> ≡ {p → f(p(A|B)) | p( ) ∈ P< | >},
where f is a function defined on [0,1].
(As Jack Good showed, the function log is particularly important, extended so that log(0) ≡ -∞.)

Im(P<A|B>) ≡ {p(A|B) | p( ) ∈ P< | >},
the image of the possible probability values.
(E.g., for a double sided coin Im(P) = {0,1}.)

P[A|B] ≡ [inf(Im(P<A|B>) , sup(Im(P<A|B>)],
the bounding interval.
(E.g., for a double sided coin [P] = [0,1].)

x = [x,x], for probabilities than are precise.

Thus we have:

P<A:C> ⊆ P<B:C> iff [P<A:C>] ⊆ [P<B:C>].

Interpolation

It is not in general true that if B’ ⇒  B then P<A|B’> = P<A|B>, but it will often be the case for ‘reasonable’ B’. If we have a context C that defines a most refined partition of B, then it will often be the case that for any B’ that is a union of members of the partition, P<A|B’> = P<A|B>.  For example, it may be that P(Outcome|Treatment) =p applies irrespective of gender and race but not age. This type of situation can be denoted by:

P<A|∂B:C> = … ,

meaning that P<A|B’> = … for all B’ that are parts of B with respect to a partition as determined by C.

Muddling

It is not assumed that limits such as those in the law of large number exist. If we have an indefinite sequence of epochs (such as years) that provide contexts Ci for which the P<A:Ci> are sometimes close to p and sometimes close to q ( > p) then P, where C covers all epochs, can have no limit, and is said to be muddled of degree (q-p). More generally, we can define ‘the degree of muddling’:

P⌈A:C⌉ ≡  limi→ ∞, (sup{P<A:Ci>} – inf{P<A:Ci>}),
where each Ci is the context, C, from Ci onwards.

This definition can be extended to whole functions thus:

P⌈ :C⌉ ≡ supA {P⌈A:C⌉}.

The usual assumption is that P⌈ :C⌉ = 0, which we can deny by stating P⌈ :C⌉ > 0 without needing to state any limits to P<A:C>. Alternatively, we often seek refined contexts C’ and specific issues, A, such that P<A:C’> is not very muddled: P⌈A:C’⌉ ≈ 0.

Often, log probability is more convenient than just probability. Here, suppose that B is a set of n equi-probabilities, independent of A. Then

P⌈A.B:C⌉ = P⌈A:C⌉/n,

so the degree of muddling depends on the granularity of the propositions, which is often arbitrary. Moreover, there is no convenient general formula. We can generalise the definition to:

ƒ⌈A:C⌉ ≡  limi→ ∞, (sup{ƒ<A:Ci>} – inf{ƒ<A:Ci>}),
where each Ci is the context, C, from Ci onwards.

Then:

(log.P)⌈A.B:C⌉ = (log.P)⌈A:C⌉ + (log.P)⌈B:C⌉.

In particular, if B is precise (i.e. (log.P)⌈B:C⌉=P⌈A.B:C⌉=0) then the degree of muddling of A.B is just that of A. This make (log.P)⌈_:C⌉ a much more natural measure than P⌈A.B:C⌉. I conjecture that many real-world issues are significantly muddled.

I am not entirely comfortable with this notation.

Draft

Dave Marsay

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: