Law of large numbers for dependent but uncorrelated random variables

One of the fundamental results in probability theory is the strong law of large numbers, which was discussed in an earlier post under the guise of the Birkhoff ergodic theorem.

Suppose we have a sequence of random variables {t_n} which take values in {{\mathbb N}}. If the sequence {t_n} is independent and identically distributed with {\mathop{\mathbb E}[t_n] = T}, then the strong law of large numbers shows that {\frac 1n(t_1+\cdots + t_n)\rightarrow T} almost surely.

It is often the case, however, that one or both of these conditions fail; there may be some dependence between the variables {t_n}, or the distribution may not be identical for all values of {n}. For example, the following situation arose recently in a project I have been working on: there is a sequence of random variables {t_n} as above, which has the property that no matter what values {t_1,\dots,t_{n-1}} take, the random variable {t_n} has expected value at most {T}. In the language of conditional expectation, we have

\displaystyle  \mathop{\mathbb E}[t_n \mid t_1,\dots, t_{n-1}] \leq T.

This is true despite the fact that the distribution of {t_n} may vary according to the values of {t_1,\dots,t_{n-1}}. In what follows we will write {\mathcal{F}_n} for the {\sigma}-algebra generated by {t_1,\dots, t_n} and write the above condition as

\displaystyle  \mathop{\mathbb E}[t_n \mid \mathcal{F}_{n-1}]\leq T.

It is plausible to conjecture that in this setting one still has {\varlimsup \frac 1n(t_1+\cdots+t_n) \leq T} almost surely. And indeed, it turns out that a statement very close to this can be proved.

Let {\mathop{\mathcal F}_n} be an increasing sequence of {\sigma}-algebras and let {t_n} be a sequence of {{\mathbb N}}-values random variables such that {t_n} is {\mathop{\mathcal F}_n}-measurable. Suppose that there is {p\colon {\mathbb N} \rightarrow [0,1]} such that

\displaystyle  \mathop{\mathbb P}[t_n=t \mid \mathcal{F}_{n-1}] \leq p(t) \ \ \ \ \ (1)

and moreover,

\displaystyle   T := \sum_{t=1}^\infty t \cdot p(t) < \infty. \ \ \ \ \ (2)

The following result is proved by modifying one of the standard proofs of the strong law of large numbers, which can be found, for example, in Billingsley’s book “Probability and Measure”, where it is Theorem 22.1. There is also a discussion on Terry Tao’s blog, which emphasises the role of the main tools in this proof: the moment method and truncation.

Proposition 1 If {t_n} is any sequence of random variables satisfying (1) and (2), then {\varlimsup_{n\rightarrow\infty} \frac 1n \sum_{k=1}^n t_k \leq T} almost surely.

Proof: Consider the truncated random variables {s_n = t_n {\mathbf{1}}_{[t_n\leq n]}}, and note that by (1) and (2) we have

\displaystyle  0 \leq s_n \leq t_n \Rightarrow \mathop{\mathbb E}[s_n \mid \mathcal{F}_{n-1}] \leq T.

Now consider the random variables {r_n = s_n + T - \mathop{\mathbb E}[s_n \mid \mathcal{F}_{n-1}]}. Note that {r_n} is {\mathcal{F}_n}-measurable, takes values in {{\mathbb N}}, and moreover

\displaystyle  \mathop{\mathbb E}[r_n \mid \mathcal{F}_{n-1}] = T \text{ for all } n. \ \ \ \ \ (3)

The most important consequence of this is that the sequence {r_n} is uncorrelated — that is, {\mathop{\mathbb E}[r_ir_j] = \mathop{\mathbb E}[r_i]\mathop{\mathbb E}[r_j] = T^2} for all {i\neq j}, which in turn implies that {\mathop{\mathbb E}[(r_i - T)(r_j-T)] = 0}. This means that the variance is additive for the sequence {r_n}:

\displaystyle  \begin{aligned} \mathop{\mathbb E}\Big[ \Big(\sum_{k=1}^n (r_k -T)\Big)^2\Big] &= \sum_{k=1}^n \mathop{\mathbb E}[(r_k -T)^2] + 2\sum_{1\leq i<j\leq n} \mathop{\mathbb E}[(r_i-T)(r_j-T)] \\ &= \sum_{k=1}^n \mathop{\mathbb E}[(r_k -T)^2]. \end{aligned}

Let {X_n = \frac 1n (r_1 + \cdots + r_n) - T}, so that {\mathop{\mathbb E}[X_N] = 0}. We can estimate {\mathop{\mathbb E}[X_n^2]} using the previous computation:

\displaystyle  \begin{aligned} \mathop{\mathbb E}[X_n^2] &= \frac 1{n^2} \sum_{k=1}^n \mathop{\mathbb E}[(r_k-T)^2] = \frac 1{n^2} \sum_{k=1}^n (\mathop{\mathbb E}[r_k^2] - 2T\mathop{\mathbb E}[r_k] + T^2) \\ &= \frac 1{n^2} \left( nT^2 + \sum_{k=1}^n \mathop{\mathbb E}[r_k^2]\right). \end{aligned} \ \ \ \ \ (4)

Recall that {p(t)} is the sequence bounding {\mathop{\mathbb P}[t_n = t \mid \mathop{\mathcal F}_{n-1}]}, and let {\bar t} be such that {\sum_{t\geq \bar t} p(t) \leq 1}. Let {Y} be a random variable taking the value {t} with probability {p(t)} for {t\geq \bar t}. Note that by the definition of {s_n} and {r_n}, we have {r_n \leq s_n + T \leq n+T}, thus

\displaystyle  \mathop{\mathbb E}[r_k^2] = \sum_{t=1}^{k+T} t^2 \mathop{\mathbb P}[r_k = t] \leq \sum_{t=1}^{k+T} t^2 p(t) \leq C + \mathop{\mathbb E}[Y^2 {\mathbf{1}}_{[Y\leq k+T]}]

for some fixed constant {C}. Note that the final expression is non-decreasing in {k}, and so together with (4) we have

\displaystyle  \mathop{\mathbb E}[X_n^2] \leq \frac 1n \left( C' + \mathop{\mathbb E}[Y^2 {\mathbf{1}}_{[Y\leq n+T]}] \right), \ \ \ \ \ (5)

where again {C'} is a fixed constant. Fixing {\varepsilon>0} and using (5) in Chebyshev’s inequality yields

\displaystyle  \mathop{\mathbb P}[|X_n| \geq \varepsilon] \leq \frac 1{\varepsilon^2 n} \left( C' + \mathop{\mathbb E}[Y^2 {\mathbf{1}}_{[Y\leq n+T]}] \right). \ \ \ \ \ (6)

Now let {n_k} be a sequence of integers such that {\lim_k \frac{n_{k+1}}{n_k} =: \alpha > 1}. We see that (6) yields

\displaystyle  \begin{aligned} \sum_{k=1}^\infty \mathop{\mathbb P}[|X_{n_k}| \geq \varepsilon] &\leq \frac 1{\varepsilon^2} \sum_{k=1}^\infty n_k^{-1}\left( C' + \mathop{\mathbb E}[Y^2 {\mathbf{1}}_{[Y\leq n_k+R]}]\right) \\ &\leq C'' + \frac 1{\varepsilon^2} \mathop{\mathbb E} \left[ Y^2 \sum_{k=1}^\infty n_k^{-1} {\mathbf{1}}_{[Y\leq n_k+T]}\right], \end{aligned} \ \ \ \ \ (7)

where {C''<\infty} depends on {\alpha} and on {\varepsilon}. Observe that

\displaystyle  \sum_{k=1}^\infty n_k^{-1} {\mathbf{1}}_{[Y\leq n_k +T]} \leq \sum_{k=k_0(Y)}^\infty n_k^{-1},

where {k_0(Y)} is the minimal value of {k} for which {n_k\geq Y-T}. Because {n_k} grows like {\alpha^k}, the sum is bounded above by a constant times {n_{k_0(Y)}^{-1}}, so that together with (7) we have

\displaystyle  \sum_{k=1}^\infty \mathop{\mathbb P}[|X_{n_k}| \geq \varepsilon] \leq C'' + C''' \mathop{\mathbb E}[Y] < \infty,

where finiteness of {\mathop{\mathbb E}[Y]} is the crucial place where we use (2).

Using the first Borel-Cantelli lemma and taking an intersection over all rational {\varepsilon>0} gives

\displaystyle  \lim_{n\rightarrow\infty} \frac 1{n_k} \sum_{j=1}^{n_k} r_j = T \text{ a.s.} \ \ \ \ \ (8)

Let {Z_n = \sum_{j=1}^n r_j}. Because {r_j\geq 0} we have {Z_{n_k} \leq Z_n \leq Z_{n_{k+1}}} for all {n_k\leq n\leq n_{k+1}}, and in particular

\displaystyle  \frac {n_k}{n_{k+1}} \frac{Z_{n_k}}{n_k} \leq \frac{Z_n}{n} \leq \frac{n_{k+1}}{n_k} \frac{Z_{n_{k+1}}}{n_{k+1}}.

Taking the limit and using (8) gives

\displaystyle  \frac 1\alpha T \leq \varliminf_{k\rightarrow\infty} \frac 1k Z_k \leq \varlimsup_{k\rightarrow\infty} \frac 1k Z_k \leq \alpha T \text{ a.s.}

Taking an intersection over all rational {\alpha>1} gives

\displaystyle  \lim_{n\rightarrow\infty} \frac 1n \sum_{k=1}^n r_k = T \text{ a.s.} \ \ \ \ \ (9)

Finally, we recall from the definition of {s_n} and {r_n} that {t_n\leq r_n} whenever {t_n\leq n}. In particular, we may observe that

\displaystyle  \sum_{n=1}^\infty \mathop{\mathbb P}[t_n > r_n] \leq \sum_{n=1}^\infty \mathop{\mathbb P}[t_n > n] \leq \mathop{\mathbb E}[t_n] \leq T

and apply Borel-Cantelli again to deduce that with probability one, {t_n > r_n} for at most finitely many values of {n}. In particular, (9) implies that

\displaystyle  \varlimsup_{n\rightarrow\infty} \frac 1n \sum_{k=1}^n t_k \leq T \text{ a.s.} \ \ \ \ \ (10)

which completes the proof of Proposition 1. \Box

Advertisements

About Vaughn Climenhaga

I'm an assistant professor of mathematics at the University of Houston. I'm interested in dynamical systems, ergodic theory, thermodynamic formalism, dimension theory, multifractal analysis, non-uniform hyperbolicity, and things along those lines.
This entry was posted in statistical laws, theorems and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s