With the previous post on convergence of random variables, the law of large numbers, and Birkhoff’s ergodic theorem as background, we return to the spectral methods discussed in the first two posts in this series. This post is based on Andrew Török’s talk from March 4 and gives a proof of the central limit theorem using the spectral gap property.

**1. Central limit theorem for IID **

Now we recall the statement of the central limit theorem (CLT) and give a proof in the case of IID (independent identically distributed) random variables.

The weak law of large numbers says that if is a sequence of IID random variables with , then writing , the time averages converge to in probability, or equivalently (since the limit is a constant), in distribution. In the case when , the central limit theorem strengthens this to the result that the sequence converges in distribution to , the normal distribution with mean and variance . That is, we have

This can be established by the same method as we used last time for the proof of the weak law of large numbers, by studying the characteristic functions of and . The characteristic function of is

Arguing as in the proof of the weak law of large numbers in the previous post, we write for the characteristic function of and observe that

where is the characteristic function of the (which are identically distributed), and the second equality uses the fact that the are independent.

Now by Taylor’s theorem, we have

using the fact that the have mean 0 and variance . Thus we conclude from (3) that

which completes the proof of the CLT in the IID case.

**2. CLT with spectral gap **

To translate the CLT into the language of dynamical systems, we consider a space and a map with an invariant measure . In general there may be many -invariant measures, and so it is important to choose a suitable measure . For example, when is an interval and is piecewise expanding, we are most interested in the case when is an acip.

Given a measurable function , the sequence of functions , , defines a sequence of identically distributed random variables on . However, they are not independent, and so we need some information about the decay of correlations between them. In particular, we can replicate the proof from the previous section as long as the transfer operator has a spectral gap.

Let’s make this precise in the case when is a piecewise expanding interval map, so the Lasota–Yorke inequality we discussed in an earlier post yields a spectral gap for the transfer operator acting on , the space of functions of bounded variation, and in particular establishes the existence of an acip .

Theorem 1Let be a piecewise expanding interval map and the acip constructed before. Suppose that is mixing. Then satisfies the central limit theorem as follows: given any with and writing , we havefor all , where is given by the Green–Kubo formula

Before proving the theorem, we make some remarks concerning the Green–Kubo formula (5). First, note that the sum converges as soon as we establish exponential decay of correlations for functions in , since each integral in the sum is just the correlation function at time . Second, note that if we replace the functions with independent random variables, then all the terms with vanish, and the term is just the variance , as in the previous section.

Note also that using (5), can be written as

Now we prove the central limit theorem (4). As in the IID case, we use the characteristic functions

where is the characteristic function of the normal distribution and is the characteristic function of , so it suffices to show that for all .

To prove this convergence of the characteristic functions, we use the following procedure.

- Write the characteristic functions in terms of a
*twisted transfer operator*, where is the function we are investigating in the CLT, and is a small real parameter. The operator is a small perturbation of the transfer operator . - Use perturbation theory of operators to show that has a spectral gap and to derive asymptotics for the leading eigenvalue . In particular, relate and to the mean and variance of the limiting distribution.

First we define the transfer operator itself by the implicit equation

for all and . Note that this is different from the transfer operator defined by integrating with respect to Lebesgue measure in (6) — it is a worthwhile exercise to determine the precise relationship between the two.

More directly, the transfer operator can be defined by

where is the density of with respect to Lebesgue measure.

Now given and , we define the *twisted* transfer operator by

To see the utility of this definition, we first note that

and so by induction we have

In particular, considering the characteristic function , we have

which accomplishes the first stage of the proof — writing the characteristic function in terms of the twisted transfer operator.

For the second stage of the proof, we consider the twisted transfer operator as a perturbation of . From the Lasota–Yorke inequality and the fact that is mixing, we know that the spectrum of has the form , where is contained in a disc of radius centred at the origin.

By the perturbation theory of linear operators, the spectrum of has the same form for small enough : there is a leading eigenvalue that is close to , and the rest of the spectrum is contained in the disc of radius . Moreover, the leading eigenvalue satisfies *(Edit: see the end of the post for a proof)*

and

which is the origin of the expression in the Green–Kubo formula.

Now we use the Riesz functional calculus, whose general ideas we briefly recall here. Let be a Banach space and the space of bounded linear operators on . Given , let be the spectrum of . Then there is a unique way to associate to each analytic function an operator such that the map is a homomorphism mapping the constant function to the identity operator and the identity function to .

This mapping can be defined by integrating around a curve surrounding the spectrum (this is similar to the Cauchy formula from complex analysis):

where we recall that is invertible for all in the resolvent . If we take to be the characteristic function of part of the spectrum, we obtain a projection to the eigenspace associated with that part.

In particular, considering the operator , we may set and obtain a projection onto the eigenspace of . Similarly, setting , where is the part of the spectrum contained in a disc of radius , we get an operator such that

Moreover, we have

which allows us to write the operator in (9) as

using the fact that . Now (9) yields

which completes the proof of the CLT.

**3. Proof of formulas for derivatives of **

The formulas given above for and were not explained. Here is a derivation of these formulas.

Let be the eigenfunction of corresponding to the eigenvalue . That is, satisfies

Multiplying by a test function and integrating against gives

Recalling the definition of , this gives

Let and . Then differentiating (10) with respect to gives

Setting and using the fact that and , we get

where we write .

Putting gives the expression for . Before finding , we observe that the above equation can also be used to find , which will be important later on. Indeed, using the assumption that , we have , and so the above equation becomes

Similarly, replacing with the test functions for , (12) gives

and so on. Observe that converges because we have exponential decay of correlations. Thus we may add the above equations (infinitely many of them) and subtract this sum from both sides to obtain

Now we can find the expression for . Set in (11) and differentiate to get

At we see that the terms containing are equal, while by the assumption that , and so (14) gives

From (13), we have

which together with (15) suffices to complete the proof of the expression for .

I like this post! A few typos: missing a integral in (6); the function in (7); the density of might be . How do you calculate ? I would get , which is wrong.

Thanks for pointing those out, I think I’ve fixed the typos now…

Here is my calculation: suppose . Take integral with respect to : . Then take derivative : . Evaluating at and using and is constant, we get . Taking second derivate, I only get . I don’t know what goes wrong 😦

It’s the right idea but you need to be careful — you can’t evaluate at t=0 until you’ve done all your differentiating, including the second time. When you do that there’s an extra integal that you have to compute, which provides the missing terms. I’ve added a section to the post clarifying the matter.

I see where my calculation went wrong. Thank you!

I made even more typos in the previous two comments than your post… Please help me to edit it 🙂