Central Limit Theorem for dynamical systems using martingales

This post is based on notes from Matt Nicol’s talk at the UH summer school in dynamical systems. The goal is to present the ideas behind a proof of the central limit theorem for dynamical systems using martingale approximations.

1. Conditional expectation

Before we can define and use martingales, we must recall the definition of conditional expectation. Let {(\Omega,\mathop{\mathbb P})} be a probability space, with {\mathop{\mathbb P}} defined on a {\sigma}-algebra {\mathop{\mathcal B}}. Let {\mathop{\mathcal F}\subset \mathop{\mathcal B}} be a sub-{\sigma}-algebra of {\mathop{\mathcal B}}.

Example 1 Consider the doubling map {T\colon [0,1]\rightarrow [0,1]} given by {T(x) = 2x\pmod 1}. Let {\mathop{\mathbb P}} be Lebesgue measure, {\mathop{\mathcal B}} the Borel {\sigma}-algebra, and {\mathop{\mathcal F} = T^{-1}\mathop{\mathcal B} = \{T^{-1}(B) \mid B\in \mathop{\mathcal B}\}}. Then {\mathop{\mathcal F}} is a sub-{\sigma}-algebra of {\mathop{\mathcal B}}, consisting of precisely those sets in {\mathop{\mathcal B}} which are unions of preimage sets {T^{-1}(x)} — that is, those sets {F\in\mathop{\mathcal B}} for which a point {y\in [0,\frac 12]} is in {F} if and only if {y+\frac 12\in F}.

This example extends naturally to yield a decreasing sequence of {\sigma}-algebras

\displaystyle  \mathop{\mathcal B} \supset T^{-1}\mathop{\mathcal B} \supset T^{-2}\mathop{\mathcal B} \supset \cdots.

Given a sub-{\sigma}-algebra {\mathop{\mathcal F}\subset \mathop{\mathcal B}} and a random variable {Y\colon \Omega\rightarrow {\mathbb R}} that is measurable with respect to {\mathop{\mathcal B}}, the conditional expectation of {Y} given {\mathop{\mathcal F}} is any random variable {Z} such that

  1. {Z} if {\mathop{\mathcal F}}-measurable (that is, {Z^{-1}(I)\in\mathop{\mathcal F}} for every interval {I\subset {\mathbb R}}), and
  2. {\int_A Z\,d\mathop{\mathbb P} = \int_A Y\,d\mathop{\mathbb P}} for every {A\in \mathop{\mathcal F}}.

It is not hard to show that these conditions characterise {Z} for an almost-sure choice of {\omega\in\Omega}, and so the conditional expectation is uniquely defined as a random variable. We write it as {\mathop{\mathbb E}[Y\mid\mathop{\mathcal F}]}.

A key property is that conditional expectation is linear: for every {\mathop{\mathcal F}\subset \mathop{\mathcal B}}, every {Y_1,Y_2\colon \Omega\rightarrow{\mathbb R}}, and every {a\in {\mathbb R}}, we have

\displaystyle  \mathop{\mathbb E}[aY_1 + Y_2\mid \mathop{\mathcal F}] = a\mathop{\mathbb E}[Y_1\mid\mathop{\mathcal F}] + \mathop{\mathbb E}[Y_2\mid\mathop{\mathcal F}].

Example 2 If {Y} is already {\mathop{\mathcal F}}-measurable, then {\mathop{\mathbb E}[Y\mid\mathop{\mathcal F}]=Y}.

Example 3 At the other extreme, if {Y} and {\mathop{\mathcal F}} are independent — that is, if {\mathop{\mathbb E}[Y{\mathbf{1}}_A]=\mathop{\mathbb E}[Y]\mathop{\mathbb P}[A]} for every {A\in\mathop{\mathcal F}} — then {\mathop{\mathbb E}[Y\mid\mathop{\mathcal F}]} is the constant function {\mathop{\mathbb E}[Y]{\mathbf{1}}}.

Example 4 Suppose {\{\Omega_i\mid i\in{\mathbb N}\}} is a countable partition of {\Omega} such that {\mathop{\mathbb P}[\Omega_i]>0} for every {i}. Let {\mathop{\mathcal F}=\sigma(\Omega_1,\Omega_2,\dots)} be the {\sigma}-algebra generated by the sets {\Omega_i}. Then

\displaystyle  \mathop{\mathbb E}[Y|\mathop{\mathcal F}] = \sum_{i\in{\mathbb N}} \frac{\mathop{\mathbb E}[Y{\mathbf{1}}_{\Omega_i}]}{\mathop{\mathbb P}[\Omega_i]} {\mathbf{1}}_{\Omega_i}.

2. Martingales

Now we can define martingales, which are a particular sort of stochastic process (sequence of random variables) with “enough independence” to generalise results from the IID case.

Definition 1 A sequence {Y_n\colon \Omega\rightarrow {\mathbb R}} of random variables is a martingale if

  1. {\mathop{\mathbb E}[|Y_n|]<\infty} for all {n};
  2. there is an increasing sequence of {\sigma}-algebras (a filtration) {\mathop{\mathcal F}_1\subset\mathop{\mathcal F}_2\subset\cdots \subset \mathop{\mathcal B}} such that {Y_n} is measurable with respect to {\mathop{\mathcal F}_n};
  3. the conditional expectations satisfy {\mathop{\mathbb E}[Y_{n+1}|\mathop{\mathcal F}_n] = Y_n}.

The first condition guarantees that everything is in {L^1}. If {\mathop{\mathcal F}_n} is taken to be the {\sigma}-algebra of events that are determined by the first {n} outcomes of a sequence of experiments, then the second condition states that {Y_n} only depends on those first {n} outcomes, while the third condition requires that if the first {n} outcomes are known, then the expected value of {Y_{n+1} - Y_n} is 0.

Example 5 Let {Y_n} be a sequence of fair coin flips — IID random variables taking the values {\pm1} with equal probability. Let {S_n=Y_1+\cdots+Y_n}. As suggested in the previous paragraph, let {\mathop{\mathcal F}_n=\sigma(Y_1,\dots,Y_n)} be the smallest {\sigma}-algebra with respect to which {Y_1,\dots,Y_n} are all measurable. (The sets in {\mathop{\mathcal F}_n} are precisely those sets in {\mathop{\mathcal B}} which are determined by knowing the values of {Y_1,\dots,Y_n}.)

It is easy to see that {S_n} satisfies the first two properties of a martingale, and for the third, we use linearity of expectation and the definition of {\mathop{\mathcal F}_n} to get

\displaystyle  \mathop{\mathbb E}[S_{n+1}|\mathop{\mathcal F}_n] = \mathop{\mathbb E}[Y_1 + \cdots + Y_n|\mathop{\mathcal F}_n] + \mathop{\mathbb E}[Y_{n+1}|\mathop{\mathcal F}_n] = S_n + 0 = S_n.

When {Y_n} is a sequence of random variables for which {S_n = Y_1+\cdots+Y_n} is a martingale, we say that the sequence {Y_n} is a martingale difference.

In the previous example the martingale property (the third condition) was a direct consequence of the fact that the random variables {Y_n = S_{n+1}-S_n} were IID. However, there are examples where the martingale differences are not IID.

Example 6 Polya’s urn is a stochastic process defined as follows. Consider an urn containing some number of red and blue balls. At each step, a single ball is drawn at random from the urn, and then returned to the urn, along with a new ball that matches the colour of the one drawn. Let {Y_n} be the fraction of the balls that are red after the {n}th iteration of this process.

Clearly the sequence of random variables {Y_n} is neither independent nor identically distributed. However, it is a martingale, as the following computation shows: suppose that at time {n} there are {p} red balls and {q} blue balls in the urn. (This knowledge represents knowing which element of {\mathop{\mathcal F}_n} we are in.) Then at time {n+1}, there will be {p+1} red balls with probability {\frac{p}{p+q}}, and {p} red balls with probability {\frac{q}{p+q}}. Either way, there will be {p+q+1} total balls, and so the expected fraction of red balls is

\displaystyle  \begin{aligned} \mathop{\mathbb E}[Y_{n+1}|\mathop{\mathcal F}_n] &= \frac{p}{p+q}\cdot\frac{p+1}{p+q+1} + \frac{q}{p+q}\cdot \frac{p}{p+q+1} \\ &= \frac{p(p+q+1)}{(p+q)(p+q+1)} = \frac{p}{p+q} = Y_n. \end{aligned}

If we assume that the martingale differences are stationary (that is, identically distributed) and ergodic, then we have the following central limit theorem for martingales, from a 1974 paper of McLeish (we follow some notes by S. Sethuraman for the statement).

Theorem 2 Let {Y_i} be a stationary ergodic sequence such that {\sigma^2 = \mathop{\mathbb E}[Y_i^2]<\infty} and {\mathop{\mathbb E}[Y_{n+1}|\mathop{\mathcal F}_n]=0}, where {\mathop{\mathcal F}_n} is the {\sigma}-algebra generated by {\{Y_i\mid 1\leq i\leq n\}}. Then {S_n = \sum_{i=1}^n Y_i} is a martingale, and {\frac{S_n}{\sigma\sqrt{n}}} converges in distribution to {N(0,1)}.

More sophisticated versions of this result are available, but this simple version will suffice for our needs.

3. Koopman operator and transfer operator

Now we want to apply 2 to a dynamical system {T\colon X\rightarrow X} with an ergodic measure {\mu} by taking {Y_i=\varphi\circ T^i} for some observable {\varphi\colon X\rightarrow{\mathbb R}}.

To carry this out, we consider two operators on {L^2(\mu)}. First we consider the Koopman operator {U\colon \varphi\mapsto \varphi\circ T}. Then we define the transfer operator {\mathop{\mathcal P}} to be its {L^2} adjoint — that is,

\displaystyle  \int(\mathop{\mathcal P}\varphi)\psi\,d\mu = \int\varphi(\psi\circ T)\,d\mu \ \ \ \ \ (1)

for all {\varphi,\psi\in L^2}. The key result for our purposes is that the operators {\mathop{\mathcal P}} and {U} are one-sided inverses of each other.

Proposition 3 Given {\varphi\in L^2}, we have

  1. {\mathop{\mathcal P} U\varphi=\varphi};
  2. {U\mathop{\mathcal P}\varphi=\mathop{\mathbb E}[\varphi|T^{-1}\mathop{\mathcal B}]}, where {\mathop{\mathcal B}} is the {\sigma}-algebra on which {\mu} is defined.

Proof: For the first claim, we see that for all {\psi\in L^2} we have

\displaystyle  \int\psi\cdot (\mathop{\mathcal P} U\varphi)\,d\mu = \int (\psi\circ T)(\varphi\circ T)\,d\mu = \int\psi\varphi\,d\mu,

where the first equality uses the definition of {\mathop{\mathcal P}} and the second uses the fact that {\mu} is invariant. To prove the second claim, we first observe that given an interval {I\subset{\mathbb R}}, we have

\displaystyle  (\mathop{\mathcal P} U\varphi)^{-1}(I) = T^{-1}((\mathop{\mathcal P} \varphi)^{-1}(I)) \in T^{-1}\mathop{\mathcal B},

since {\mathop{\mathcal P}} maps {\mathop{\mathcal B}}-measurable functions to {\mathop{\mathcal B}}-measurable functions. This shows that {\mathop{\mathcal P} U\varphi} is {T^{-1}\mathop{\mathcal B}}-measurable, and it remains to show that

\displaystyle  \int_{T^{-1}B} U\mathop{\mathcal P}\varphi\,d\mu = \int_{T^{-1}B} \varphi\,d\mu \text{ for all }B\in \mathop{\mathcal B}. \ \ \ \ \ (2)

This follows from a similar computation to the one above: given {B\in \mathop{\mathcal B}} we have

\displaystyle  \begin{aligned} \int_{T^{-1}B} U\mathop{\mathcal P}\varphi\,d\mu &= \int ((\mathop{\mathcal P}\varphi)\circ T) \cdot {\mathbf{1}}_{T^{-1}B}\,d\mu = \int((\mathop{\mathcal P}\varphi)\circ T) ({\mathbf{1}}_B \circ T)\,d\mu \\ &= \int(\mathop{\mathcal P}\varphi) {\mathbf{1}}_B\,d\mu = \int\varphi\cdot ({\mathbf{1}}_B\circ T)\,d\mu = \int_{T^{-1}B} \varphi\,d\mu, \end{aligned}

which establishes (2) and completes the proof. \Box

We see from Proposition 3 that a function has zero conditional expectation with respect to {T^{-1}\mathop{\mathcal B}} if and only if it is in the kernel of {\mathop{\mathcal P}}. In particular, if {\mathop{\mathcal P} h = 0} then {\sum_{j=1}^n h\circ T^j} is a martingale; this will be a key tool in the next section.

Example 7 Let {X=[0,1]} and {T} be the doubling map. Let {\mu} be Lebesgue measure. For convenience of notation we consider the {L^2} space of complex-valued functions on {X}; the functions {\varphi_n\colon x\mapsto e^{2\pi inx}} form an orthonormal basis for this space. A simple calculation shows that

\displaystyle  U\varphi_n(x) = \varphi_n(2x) = e^{2\pi in(2x)} = \varphi_{2n}(x),

so {U\colon \varphi_n\rightarrow \varphi_{2n}}. For the transfer operator we obtain {\mathop{\mathcal P}\colon \varphi_{2n}\rightarrow \varphi_n}, while for odd values of {n} we have

\displaystyle  \begin{aligned} \mathop{\mathcal P}\varphi_n(x) &= \frac 12 \left(\varphi_n\left(\frac x2\right) + \varphi_n\left(\frac{1+x}{2}\right)\right) \\ &= \frac 12 \left( e^{\pi i n x} + e^{\pi i n x} e^{\pi i n}\right) = 0. \end{aligned}

4. Martingale approximation and CLT

The machinery of the Koopman and transfer operators from the previous section can be used to apply the martingale central limit theorem to observations of dynamical systems via the technique of martingale approximation, which was introduced by M. Gordin in 1969.

The idea is that if {\mathop{\mathcal P}^n\varphi\rightarrow 0} quickly enough for functions {\varphi} with {\int\varphi\,d\mu=0}, then we can approximate the sequence {\varphi\circ T^n} with a martingale sequence {h\circ T^n}.

More precisely, suppose that the sequence {\|\mathop{\mathcal P}^n\varphi\|_2} is summable; then we can define an {L^2} function {g} by

\displaystyle  g = \sum_{n=1}^\infty \mathop{\mathcal P}^n\varphi. \ \ \ \ \ (3)

Let {h = \varphi + g - g\circ T\in L^2}. We claim that {S_nh = \sum_{j=1}^n h\circ T^j} is a martingale. Indeed,

\displaystyle  \mathop{\mathcal P} h = \mathop{\mathcal P}\varphi + \mathop{\mathcal P} \sum_{n\geq 1} \mathop{\mathcal P}^n\varphi - \mathop{\mathcal P} U \sum_{n\geq 1} \mathop{\mathcal P}^n\varphi,

and since {\mathop{\mathcal P} U} is the identity we see that the last term is just {\sum_n \mathop{\mathcal P}^n \varphi}, so that {\mathop{\mathcal P} h = 0}.

Proposition 3 now implies that {\mathop{\mathbb E}[h|T^{-1}\mathop{\mathcal B}]=0}, and we conclude that {S_nh} is a martingale, so by the martingale CLT {\frac{S_n h}{\sigma\sqrt{n}}} converges in distribution to {N(0,1)}, where {\sigma^2 = \mathop{\mathbb E}[h^2] = \int h^2\,d\mu}.

Now we want to apply this result to obtain information about {\varphi} itself, and in particular about {S_n\varphi = \sum_{j=1}^n \varphi\circ T^j}. We have {\varphi = h + g\circ T - g}, and so

\displaystyle  S_n\varphi = S_n h + \sum_{j=1}^n (g\circ T^{j+1} - g\circ T^j) = S_nh + g\circ T^{n+1} - g.

This yields

\displaystyle  \frac{S_n\varphi}{\sigma\sqrt n} = \frac{S_n h}{\sigma\sqrt n} + \frac{g\circ T^{n+1} - g}{\sigma\sqrt n},

and the last term goes to 0 in probability, which yields the central limit theorem for {S_n\varphi}.

Remark 1 There is a technical problem we have glossed over, which is that the sequence of {\sigma}-algebras {T^{-n}\mathop{\mathcal B}} is decreasing, not increasing as is required by the definition of a martingale. One solution to this is to pass to the natural extension {\hat T} and to consider the functions {h\circ \hat{T}^{-j}} and the {\sigma}-algebras {\hat{\mathop{\mathcal F}}_j = \hat{T}^j\mathop{\mathcal B}}. Another solution is to use reverse martingales, but we do not discuss this here.

Example 8 Let {X=[0,1]} and {T\colon X\rightarrow X} be an intermittent type (Manneville–Pomeau) map given by

\displaystyle  T(x) = \begin{cases} x(1 - (2x)^\alpha) & x\in [0,\frac 12), \\ 2x-1 & x\in [\frac 12,1], \end{cases}

where {0<\alpha<1} is a fixed parameter. It can be shown that {T} has a unique absolutely continuous invariant probability measure {\mu}, and that the transfer operator {\mathop{\mathcal P}} has the following contraction property: for every {\varphi\in L^2} with {\int \varphi\,d\mu=0}, there is {C\in{\mathbb R}} such that {\|\mathop{\mathcal P}^n\varphi\|_2 \leq Cn^{-\gamma}}, where {\gamma = \frac 12(\frac 1\alpha -1)}.

For small values of {\alpha}, this shows that {\|\mathop{\mathcal P}^n\varphi\|_2} is summable, and consequently {\mu} satisfies the CLT by the above discussion.

About Vaughn Climenhaga

I'm an assistant professor of mathematics at the University of Houston. I'm interested in dynamical systems, ergodic theory, thermodynamic formalism, dimension theory, multifractal analysis, non-uniform hyperbolicity, and things along those lines.
This entry was posted in ergodic theory, statistical laws, theorems and tagged , . Bookmark the permalink.

4 Responses to Central Limit Theorem for dynamical systems using martingales

  1. Ad Noctem says:

    There is a proof of Central Limit Theorem in some textbooks using characteristic functions. What is the additional advantage of this proof? Does it(esp. martingales) have significance in dynamical systems or throw further light on ideas?

  2. The proof using characteristic functions is described in this earlier post: https://vaughnclimenhaga.wordpress.com/2013/03/17/spectral-methods-3-central-limit-theorem/

    That proof uses the spectral gap property, which implies that correlations decay exponentially — in the language of this post, this means that if \int \phi\,d\mu=0, then \|\mathcal{P}^n \phi\| \to 0 exponentially quickly. The proof described here using martingales works in the more general setting when this decay is only known to be summable (in particular, this includes polynomial decay of correlations).

    The distinction between exponential and polynomial decay of correlations is usually associated with presence of a phase transition in the system, with exponential decay when there is no phase transition (there is a unique equilibrium state) and polynomial decay when there is a phase transition (multiple equilibrium states). Thus the martingale approach is useful to study what happens at phase transitions.

  3. Joe Mitchell says:

    Hi Vaughan,

    I’m a 4th year undergraduate and currently doing a project on CLTs in dynamical systems.

    In my project, I have used the Transfer operator in the same way as you have mentioned to get the martingale-coboundary decomposition. However, I’m sort of skirting around the topic of martingales, not wanting to go into too much detail on them.

    I was wondering if you knew of any literature that mentions the fact that w \in KerP means that w defines a martingale of sorts. I want to be able to reference this fact, without having the go through defining martingales, martingale differences etc and the required proofs.

    Perhaps it’s just a commonplace fact and I won’t be able to find anything. But if you do know of anything, it would be greatly appreciated! This was the only page I easily find that mentioned this fact!

    Many thanks,

    Joe Mitchell

    • I’m afraid my knowledge of the literature on martingales is not deep enough to give you a quick reference. This post and what I know about the subject mostly comes from some talks Matt Nicol gave at Houston, and from the references in the post. Sorry I can’t be more helpful!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s