One of the fundamental results in probability theory is the strong law of large numbers, which was discussed in an earlier post under the guise of the Birkhoff ergodic theorem.

Suppose we have a sequence of random variables which take values in . If the sequence is independent and identically distributed with , then the strong law of large numbers shows that almost surely.

It is often the case, however, that one or both of these conditions fail; there may be some dependence between the variables , or the distribution may not be identical for all values of . For example, the following situation arose recently in a project I have been working on: there is a sequence of random variables as above, which has the property that no matter what values take, the random variable has expected value at most . In the language of conditional expectation, we have

This is true despite the fact that the distribution of may vary according to the values of . In what follows we will write for the -algebra generated by and write the above condition as

It is plausible to conjecture that in this setting one still has almost surely. And indeed, it turns out that a statement very close to this can be proved.

Let be an increasing sequence of -algebras and let be a sequence of -values random variables such that is -measurable. Suppose that there is such that

The following result is proved by modifying one of the standard proofs of the strong law of large numbers, which can be found, for example, in Billingsley’s book “Probability and Measure”, where it is Theorem 22.1. There is also a discussion on Terry Tao’s blog, which emphasises the role of the main tools in this proof: the moment method and truncation.

Proposition 1If is any sequence of random variables satisfying (1) and (2), then almost surely.

*Proof:* Consider the truncated random variables , and note that by (1) and (2) we have

Now consider the random variables . Note that is -measurable, takes values in , and moreover

The most important consequence of this is that the sequence is *uncorrelated* — that is, for all , which in turn implies that . This means that the variance is additive for the sequence :

Let , so that . We can estimate using the previous computation:

Recall that is the sequence bounding , and let be such that . Let be a random variable taking the value with probability for . Note that by the definition of and , we have , thus

for some fixed constant . Note that the final expression is non-decreasing in , and so together with (4) we have

where again is a fixed constant. Fixing and using (5) in Chebyshev’s inequality yields

Now let be a sequence of integers such that . We see that (6) yields

where depends on and on . Observe that

where is the minimal value of for which . Because grows like , the sum is bounded above by a constant times , so that together with (7) we have

where finiteness of is the crucial place where we use (2).

Using the first Borel-Cantelli lemma and taking an intersection over all rational gives

Let . Because we have for all , and in particular

Taking the limit and using (8) gives

Taking an intersection over all rational gives

Finally, we recall from the definition of and that whenever . In particular, we may observe that

and apply Borel-Cantelli again to deduce that with probability one, for at most finitely many values of . In particular, (9) implies that

which completes the proof of Proposition 1.

How did you get ? I first tried Markov inequality but this does not seem to work.

That step was poorly explained; see my reply to Matt’s comment below.

Where did the dependence on go in the part where ? From what I understand we have no assumption on the being iid?

Ah, yes, that step was a bit sloppy, what I should have written is the following:

where the inequalities in the second line use (1) and (2).