# Induction and epochs

### Introduction

Induction is the basis of all empirical knowledge. Informally, if something has never or always been the case, one expects it to continue to be never or always the case: any change would mark a change in epoch.

### Mathematical Induction

Mathematical induction concerns mathematical statements, not empirical knowledge.

Let S(n) denote  statement dependent on an integer variable, n.
If:
For all integers n, S(n) implies S(n+1), and
S(k) for some integer k,
Then:
S(i) for all integers i ≥ k .

This, and variants on it, is often used to prove theories for all integers. It motivates informal induction.

### Statistical Induction

According to the law of large numbers, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed. Thus:

For two or more sufficiently large sets of results obtained by random sampling from the same distribution, the averages  should be close, and will tend to become closer as more trails are performed.

In particular, if one set of results, R1, has been obtained and another, R2, will be obtained, using the language of probability theory, if C() is a condition on a results then

P(C(R2)) = p(C(R1)), where P() is the probability and p() is the proportion.

Alternatively, p() could be given as a hypothesis and tested against the data. Note that for any given quantity of data, rare events cannot be excluded, and so one can never be sure that any p(x) is ‘objectively’ very small. That is, the ‘closeness’ in the law of large numbers always has some non-zero tolerance.

A key assumption of statistical induction is that there exists a stable ‘expectation’. This is only true within some epoch where the trials depend on that epoch, and not on any sub-epochs. In effect, the limits on an epoch are determined by the limits on the law of large numbers.

### Empirical Induction

In practice we don’t always have the conditions required for straightforward statistics, but we can approximate. Using the notation as above, then:

P(C(R2)) = p(C(R1)),

provided that R1, R2 are in the same epoch. That is, where:

• The sampling was either unbiased, had the same bias in the two cases or at least was not conditional on anything that changed between the two cases.
• For some basis of hypotheses, {H}, the conditional likelihoods P(data|H) are unchanged between the two cases.

Alternatively, we can let A=”same epoch” be the above assumptions and make

P(C(R2)|A) = p(C(R1)).

### Induction on Hypotheses

Statistical induction only considers proportions.  The other main case is where we have hypotheses (e.g. models or theories) that fit the past data. If these are static then we may expect some of the hypotheses that fit to be ‘true’ and hence to continue to fit. That is:

If for all i in some index set I hypotheses Hi fit the current data  (R1), then for some subset, J, of I,  by default one expects that for all j in J, Hj will continue to fit (for future data, R2).

As above, there is an assumption that the epoch hasn’t changed.

Often we are only interested in some of the parameters of a hypothesis, such as a location. Even if all the theories that fit the current data virtually agree on the current value of the parameters of interest, there may be radically different possibilities for their future values, perhaps forming a multi-modal distribution. (For example, if we observe an aircraft entering our airspace, we may be sure about where it is and how fast it is flying, but have many possible destinations.)

### Pragmatic induction

One common form of pragmatism is where one has an ‘established’ model or belief which one goes on using (unquestioning) unless and until it is falsified. By default the assumption A, above, is taken to be true. Thus one has

P(C(R2)) = p(C(R1)),

unless there is definite evidence that P() will have changed, e.g. a biased sample or an epochal change of the underlying random process. In effect, pragmatism assumes that the current epoch will extend indefinitely.

### Rationalizing induction

The difference between statistical and pragmatic induction is that the former makes explicit the assumptions of the latter. If one has a pragmatic claim, P(C(R2)) = p(C(R1)), one in effect recover the rigour of the statistical approach by noting when, where and how the data supporting the estimate was sampled, compared with when where and how the probability estimate is to be applied. (Thus  it might be pragmatic – in this pedantic sense – to suppose that if our radar fails temporarily that all airplanes will have continued flying straight and level, but not necessarily sensible.)

### Example

When someone, Alf, says ‘all swans are white’ and a foreigner, Willem, says that they have seen black swans, we should consider whether Alf’s statement is empirical or not, and if so what it’s support is. Possibly:

• Alf defines swans in such a way that they must be white: they are committed to calling a similar black creature something else. Perhaps this is a widely accepted definition that Willem is unaware of.
• Alf has only seen British swans, and we should interpret their statement as ‘British swans are white’.
• Alf believes that swans are white and so only samples large white birds to check that they are swans.
• Alf genuinely and reasonably believes that the statement ‘all swans are white’ has been subjected to the widest scrutiny, but Willem has just returned from a new-found continent

Even if Alf’s belief was soundly based on pragmatic induction, it would be prudent for him to revise his opinion, since his induction – of whatever kind – was clearly based on too small an epoch.

### Analysis

We can split conventional induction into three parts:

1. Modelling the data.
2. Extrapolating, using the models.
3. Consider predictions based on the extrapolations.

The final step is usually implicit in induction: it is usually supposed that one should always take an extrapolation to be a prediction. But there are exceptions. (Suppose that two airplanes are flying straight towards each other. A candidate prediction would be that they would pass infeasibly close, breaching the aviation rules that are supposed to govern the airspace. Hence we anticipate the end of the current ‘straight and level’ epoch and take recourse to a ‘higher’ epoch, in this case the pilots or air-traffic controllers. If they follow set rules of the road (e.g. planes flying out give way) then we may be able to continue extrapolating within the higher epoch, but here we only consider extrapolation within a given epoch.)

Thus we might reasonably imagine a process somewhat like:

1. Model the data.
2. Extrapolate, using the models.
3. Establish predictions:
• If the candidate predictions all agree: Take the extrapolations to be a candidate prediction.
• Otherwise: Make a possibilistic candidate prediction; the previous ‘state’ has ‘set up the conditions’ for the possibilities.
4. Establish credibility:
1. If the candidate predictions are consistent with the epoch, then they are credible.
2. If not, note lack of credibility.

In many cases a natural ‘null hypothesis’ is that many elements of a hypothesis are independent, so that they be extrapolated separately. There are then ‘holistic’ constraints that need to be applied over all. This can be done as a part of the credibility check. (For example, airplanes normally fly independently but should not fly too close.)

We can fail to identify a credible hypothesis either because we have not considered a wide enough range of hypotheses or because the epoch has ended. The epoch may also end without our noticing, leading to a seemingly credible prediction that is actually based on a false premise. We can potentially deal with all these problems by considering a broader range of hypotheses and data. Induction is only as good as the data gathering and theorising that supports it.

### Complicatedness

The modelling process may be complicated in two ways:

• We may need to derive useful categories so that we have enough data in each category.
• We may need to split the data into epochs, with different statistics for each.

We need to have enough data in each partition to be statistically meaningful, while being reasonably sure that data in the same partition are all alike in terms of transition probabilities. If the parts are too large we can get averaged results, which need to be treated accordingly.

### Induction and types of complexity

We can use induction to derive a typology for complexity:

• simple unconditional: the model is given: just apply it
• simple conditional: check the model and apply it
• singly complicated: analyse the data in a single epoch against given categories to derive a model, apply it.
• doubly complicated: analyse the data into novel categories or epochs to drive a model, apply it.
• complex: where the data being observed has a reflexive relationship with any predictions.

The Cynefin framework  gives a simple – complicated – complex – chaotic sense-making typology that is consistent with this, save that it distinguishes between:

• complex: we can probe and make sense
• chaotic: we must act first to force the situation to ‘make sense’.

We cannot make this distinction yet as we are not sure what ‘makes sense’ would mean. It may be that one can only know that one has made sense when and if one has had a succesful intervention, which will often mean that ‘making sense’ is more of a continuing activity that a state to be achieved. But inadequate theorising and data would clearly lead to chaos, and we might initially act to consider more theories and to gather more data. But it is not clear how we would know that we had done enough.

David Marsay