This post is based on notes from Matt Nicol’s talk at the UH summer school in dynamical systems. The goal is to present the ideas behind a proof of the central limit theorem for dynamical systems using martingale approximations.

**1. Conditional expectation **

Before we can define and use martingales, we must recall the definition of conditional expectation. Let be a probability space, with defined on a -algebra . Let be a sub--algebra of .

Example 1Consider the doubling map given by . Let be Lebesgue measure, the Borel -algebra, and . Then is a sub--algebra of , consisting of precisely those sets in which are unions of preimage sets — that is, those sets for which a point is in if and only if .

This example extends naturally to yield a decreasing sequence of -algebras

Given a sub--algebra and a random variable that is measurable with respect to , the *conditional expectation* of given is any random variable such that

- if -measurable (that is, for every interval ), and
- for every .

It is not hard to show that these conditions characterise for an almost-sure choice of , and so the conditional expectation is uniquely defined as a random variable. We write it as .

A key property is that conditional expectation is linear: for every , every , and every , we have

Example 2If is already -measurable, then .

Example 3At the other extreme, if and are independent — that is, if for every — then is the constant function .

Example 4Suppose is a countable partition of such that for every . Let be the -algebra generated by the sets . Then

**2. Martingales **

Now we can define martingales, which are a particular sort of stochastic process (sequence of random variables) with “enough independence” to generalise results from the IID case.

Definition 1A sequence of random variables is a martingale if

- for all ;
- there is an increasing sequence of -algebras (a
filtration) such that is measurable with respect to ;- the conditional expectations satisfy .

The first condition guarantees that everything is in . If is taken to be the -algebra of events that are determined by the first outcomes of a sequence of experiments, then the second condition states that only depends on those first outcomes, while the third condition requires that if the first outcomes are known, then the expected value of is 0.

Example 5Let be a sequence of fair coin flips — IID random variables taking the values with equal probability. Let . As suggested in the previous paragraph, let be the smallest -algebra with respect to which are all measurable. (The sets in are precisely those sets in which are determined by knowing the values of .)It is easy to see that satisfies the first two properties of a martingale, and for the third, we use linearity of expectation and the definition of to get

When is a sequence of random variables for which is a martingale, we say that the sequence is a *martingale difference*.

In the previous example the martingale property (the third condition) was a direct consequence of the fact that the random variables were IID. However, there are examples where the martingale differences are not IID.

Example 6Polya’s urn is a stochastic process defined as follows. Consider an urn containing some number of red and blue balls. At each step, a single ball is drawn at random from the urn, and then returned to the urn, along with a new ball that matches the colour of the one drawn. Let be the fraction of the balls that are red after the th iteration of this process.Clearly the sequence of random variables is neither independent nor identically distributed. However, it is a martingale, as the following computation shows: suppose that at time there are red balls and blue balls in the urn. (This knowledge represents knowing which element of we are in.) Then at time , there will be red balls with probability , and red balls with probability . Either way, there will be total balls, and so the expected fraction of red balls is

If we assume that the martingale differences are stationary (that is, identically distributed) and ergodic, then we have the following central limit theorem for martingales, from a 1974 paper of McLeish (we follow some notes by S. Sethuraman for the statement).

Theorem 2Let be a stationary ergodic sequence such that and , where is the -algebra generated by . Then is a martingale, and converges in distribution to .

More sophisticated versions of this result are available, but this simple version will suffice for our needs.

**3. Koopman operator and transfer operator **

Now we want to apply 2 to a dynamical system with an ergodic measure by taking for some observable .

To carry this out, we consider two operators on . First we consider the Koopman operator . Then we define the transfer operator to be its adjoint — that is,

for all . The key result for our purposes is that the operators and are one-sided inverses of each other.

- ;
- , where is the -algebra on which is defined.

*Proof:* For the first claim, we see that for all we have

where the first equality uses the definition of and the second uses the fact that is invariant. To prove the second claim, we first observe that given an interval , we have

since maps -measurable functions to -measurable functions. This shows that is -measurable, and it remains to show that

This follows from a similar computation to the one above: given we have

which establishes (2) and completes the proof.

We see from Proposition 3 that a function has zero conditional expectation with respect to if and only if it is in the kernel of . In particular, if then is a martingale; this will be a key tool in the next section.

Example 7Let and be the doubling map. Let be Lebesgue measure. For convenience of notation we consider the space of complex-valued functions on ; the functions form an orthonormal basis for this space. A simple calculation shows thatso . For the transfer operator we obtain , while for odd values of we have

**4. Martingale approximation and CLT **

The machinery of the Koopman and transfer operators from the previous section can be used to apply the martingale central limit theorem to observations of dynamical systems via the technique of martingale approximation, which was introduced by M. Gordin in 1969.

The idea is that if quickly enough for functions with , then we can approximate the sequence with a martingale sequence .

More precisely, suppose that the sequence is summable; then we can define an function by

Let . We claim that is a martingale. Indeed,

and since is the identity we see that the last term is just , so that .

Proposition 3 now implies that , and we conclude that is a martingale, so by the martingale CLT converges in distribution to , where .

Now we want to apply this result to obtain information about itself, and in particular about . We have , and so

This yields

and the last term goes to 0 in probability, which yields the central limit theorem for .

Remark 1There is a technical problem we have glossed over, which is that the sequence of -algebras is decreasing, not increasing as is required by the definition of a martingale. One solution to this is to pass to the natural extension and to consider the functions and the -algebras . Another solution is to use reverse martingales, but we do not discuss this here.

Example 8Let and be an intermittent type (Manneville–Pomeau) map given bywhere is a fixed parameter. It can be shown that has a unique absolutely continuous invariant probability measure , and that the transfer operator has the following contraction property: for every with , there is such that , where .

For small values of , this shows that is summable, and consequently satisfies the CLT by the above discussion.

There is a proof of Central Limit Theorem in some textbooks using characteristic functions. What is the additional advantage of this proof? Does it(esp. martingales) have significance in dynamical systems or throw further light on ideas?

The proof using characteristic functions is described in this earlier post: http://vaughnclimenhaga.wordpress.com/2013/03/17/spectral-methods-3-central-limit-theorem/

That proof uses the spectral gap property, which implies that correlations decay exponentially — in the language of this post, this means that if , then exponentially quickly. The proof described here using martingales works in the more general setting when this decay is only known to be summable (in particular, this includes polynomial decay of correlations).

The distinction between exponential and polynomial decay of correlations is usually associated with presence of a phase transition in the system, with exponential decay when there is no phase transition (there is a unique equilibrium state) and polynomial decay when there is a phase transition (multiple equilibrium states). Thus the martingale approach is useful to study what happens at phase transitions.

Hi Vaughan,

I’m a 4th year undergraduate and currently doing a project on CLTs in dynamical systems.

In my project, I have used the Transfer operator in the same way as you have mentioned to get the martingale-coboundary decomposition. However, I’m sort of skirting around the topic of martingales, not wanting to go into too much detail on them.

I was wondering if you knew of any literature that mentions the fact that w \in KerP means that w defines a martingale of sorts. I want to be able to reference this fact, without having the go through defining martingales, martingale differences etc and the required proofs.

Perhaps it’s just a commonplace fact and I won’t be able to find anything. But if you do know of anything, it would be greatly appreciated! This was the only page I easily find that mentioned this fact!

Many thanks,

Joe Mitchell

I’m afraid my knowledge of the literature on martingales is not deep enough to give you a quick reference. This post and what I know about the subject mostly comes from some talks Matt Nicol gave at Houston, and from the references in the post. Sorry I can’t be more helpful!