## Spectral methods 3 – central limit theorem

With the previous post on convergence of random variables, the law of large numbers, and Birkhoff’s ergodic theorem as background, we return to the spectral methods discussed in the first two posts in this series. This post is based on Andrew Török’s talk from March 4 and gives a proof of the central limit theorem using the spectral gap property.

1. Central limit theorem for IID

Now we recall the statement of the central limit theorem (CLT) and give a proof in the case of IID (independent identically distributed) random variables.

The weak law of large numbers says that if ${X_n}$ is a sequence of IID random variables with ${\mathop{\mathbb E}[X_n] = 0}$, then writing ${S_n = \sum_{k=0}^{n-1}X_k}$, the time averages ${\frac 1n S_n}$ converge to ${0}$ in probability, or equivalently (since the limit is a constant), in distribution. In the case when ${\sigma^2 = \mathop{\mathbb E}[X^2]<\infty}$, the central limit theorem strengthens this to the result that the sequence ${\frac 1{\sqrt{n}} S_n}$ converges in distribution to ${N(0,\sigma^2)}$, the normal distribution with mean ${0}$ and variance ${\sigma^2}$. That is, we have

$\displaystyle \mathop{\mathbb P}\left( \frac 1{\sqrt n} S_n \leq c\right) \xrightarrow{n\rightarrow\infty} \frac 1{\sigma\sqrt{2\pi}} \int_{-\infty}^c e^{-t^2/2\sigma^2}\,dt \ \ \ \ \ (1)$

for every ${c\in {\mathbb R}}$.

This can be established by the same method as we used last time for the proof of the weak law of large numbers, by studying the characteristic functions of ${\frac 1{\sqrt n}S_n}$ and ${N(0,\sigma^2)}$. The characteristic function of ${N(0,\sigma^2)}$ is

$\displaystyle \psi(t) = e^{-\frac 12 \sigma^2 t^2}. \ \ \ \ \ (2)$

Arguing as in the proof of the weak law of large numbers in the previous post, we write ${\varphi_n}$ for the characteristic function of ${\frac 1{\sqrt n} S_n}$ and observe that

$\displaystyle \varphi_n(t) = \mathop{\mathbb E}[e^{it \frac 1{\sqrt n} (X_1+\cdots + X_n)}] = \prod_{j=1}^n \mathop{\mathbb E}[e^{\frac{it}{\sqrt n} X_j}] = \varphi\left(\frac {t}{\sqrt n}\right)^n, \ \ \ \ \ (3)$

where ${\varphi}$ is the characteristic function of the ${X_j}$ (which are identically distributed), and the second equality uses the fact that the ${X_j}$ are independent.

Now by Taylor’s theorem, we have

\displaystyle \begin{aligned} \varphi(t/\sqrt{n}) &= \mathop{\mathbb E}[e^{\frac {it}{\sqrt n}X_j}] = 1 + \frac{it}{\sqrt n} \mathop{\mathbb E}[X_j] - \frac{t^2}{2n} \mathop{\mathbb E}[X_j^2] + o(t^2) \\ &= 1 - \frac {t^2}{2n}\sigma^2 + o(t^2), \end{aligned}

using the fact that the ${X_j}$ have mean 0 and variance ${\sigma^2}$. Thus we conclude from (3) that

$\displaystyle \varphi_n(t) = \left(1 - \frac{t^2\sigma^2}{2n} + o(t^2)\right)^n \xrightarrow{n\rightarrow\infty} e^{-\frac 12 t^2\sigma^2} = \psi(t),$

which completes the proof of the CLT in the IID case.

2. CLT with spectral gap

To translate the CLT into the language of dynamical systems, we consider a space ${X}$ and a map ${T\colon X\rightarrow X}$ with an invariant measure ${\mu}$. In general there may be many ${T}$-invariant measures, and so it is important to choose a suitable measure ${\mu}$. For example, when ${X}$ is an interval and ${f}$ is piecewise expanding, we are most interested in the case when ${\mu}$ is an acip.

Given a measurable function ${f\colon X\rightarrow {\mathbb R}}$, the sequence of functions ${f}$, ${f\circ T}$, ${f\circ T^2, \dots}$ defines a sequence of identically distributed random variables on ${X}$. However, they are not independent, and so we need some information about the decay of correlations between them. In particular, we can replicate the proof from the previous section as long as the transfer operator has a spectral gap.

Let’s make this precise in the case when ${T}$ is a piecewise expanding interval map, so the Lasota–Yorke inequality we discussed in an earlier post yields a spectral gap for the transfer operator ${\mathcal{P}_T}$ acting on ${BV}$, the space of functions of bounded variation, and in particular establishes the existence of an acip ${\mu}$.

Theorem 1 Let ${T}$ be a piecewise expanding interval map and ${\mu}$ the acip constructed before. Suppose that ${\mu}$ is mixing. Then ${\mu}$ satisfies the central limit theorem as follows: given any ${f\in BV}$ with ${\int f\,d\mu = 0}$ and writing ${S_nf(x) = \sum_{k=0}^{n-1} f\circ T^k}$, we have

$\displaystyle \mu \left\{ x \mid \frac 1{\sqrt{n}} S_nf(x) \leq c \right\} \xrightarrow{n\rightarrow\infty} \frac 1{\sigma\sqrt{2\pi}} \int_{-\infty}^c e^{-x^2/2\sigma^2}\,dx \ \ \ \ \ (4)$

for all ${c\in {\mathbb R}}$, where ${\sigma}$ is given by the Green–Kubo formula

$\displaystyle \sigma^2 = \sum_{n\in{\mathbb Z}} \int_X f \cdot (f\circ T^n) \,d\mu, \ \ \ \ \ (5)$

and ${\sigma=0}$ if and only if there exists ${g\in BV}$ and ${c\in {\mathbb R}}$ such that ${f=c+g\circ T - g}$.

Before proving the theorem, we make some remarks concerning the Green–Kubo formula (5). First, note that the sum converges as soon as we establish exponential decay of correlations for functions in ${BV}$, since each integral in the sum is just the correlation function at time ${n}$. Second, note that if we replace the functions ${f\circ T^n}$ with independent random variables, then all the terms with ${n\neq 0}$ vanish, and the ${n=0}$ term is just the variance ${\mathop{\mathbb E}[X^2]}$, as in the previous section.

Note also that using (5), ${\sigma^2}$ can be written as

$\displaystyle \sigma^2 = \lim_{n\rightarrow\infty} \frac 1n \int (S_n f)^2\,d\mu.$

Now we prove the central limit theorem (4). As in the IID case, we use the characteristic functions

\displaystyle \begin{aligned} \psi(t) &= e^{-\sigma^2 t^2/2}, \\ \varphi_n(t) &= \mathop{\mathbb E}_\mu[e^{it(S_nf)/\sqrt{n}}] = \int e^{\frac{it}{\sqrt n} S_nf} \,d\mu, \end{aligned}

where ${\psi(t)}$ is the characteristic function of the normal distribution and ${\varphi_n}$ is the characteristic function of ${\frac 1{\sqrt n}S_nf}$, so it suffices to show that ${\varphi_n(t)\rightarrow \psi(t)}$ for all ${t}$.

To prove this convergence of the characteristic functions, we use the following procedure.

1. Write the characteristic functions ${\varphi_n}$ in terms of a twisted transfer operator ${\mathcal{P}_{f,t}}$, where ${f}$ is the function we are investigating in the CLT, and ${t\approx 0}$ is a small real parameter. The operator ${\mathcal{P}_{f,t}}$ is a small perturbation of the transfer operator ${\mathcal{P}_T}$.
2. Use perturbation theory of operators to show that ${\mathcal{P}_{f,t}}$ has a spectral gap and to derive asymptotics for the leading eigenvalue ${\lambda(t)}$. In particular, relate ${\lambda'(0)}$ and ${\lambda''(0)}$ to the mean and variance of the limiting distribution.

First we define the transfer operator itself by the implicit equation

$\displaystyle \int (\mathcal{P}_T g)\cdot h\,d\mu = \int g\cdot (h\circ T)\,d\mu \ \ \ \ \ (6)$

for all ${g\in L^1(\mu)}$ and ${h\in L^\infty}$. Note that this is different from the transfer operator defined by integrating with respect to Lebesgue measure in (6) — it is a worthwhile exercise to determine the precise relationship between the two.

More directly, the transfer operator can be defined by

$\displaystyle \mathcal{P}_T g(x) = \sum_{y\in T^{-1}(x)} \frac{g(y)h(y)}{|T'(y)|}, \ \ \ \ \ (7)$

where ${h}$ is the density of ${\mu}$ with respect to Lebesgue measure.

Now given ${f\in BV}$ and ${t\in{\mathbb R}}$, we define the twisted transfer operator by

$\displaystyle \mathcal{P}_{f,t}g = \mathcal{P}_T(e^{itf}g). \ \ \ \ \ (8)$

To see the utility of this definition, we first note that

$\displaystyle \int\mathcal{P}_{f,t}(g)\,d\mu = \int \mathcal{P}_T(e^{itf}g){\mathbf{1}}\,d\mu =\int e^{itf} g\,d\mu,$

and so by induction we have

$\displaystyle \int \mathcal{P}_{f,t}^n(g) \,d\mu = \int e^{itS_nf} g\,d\mu.$

In particular, considering the characteristic function ${\varphi_n}$, we have

$\displaystyle \varphi_n(t) = \int e^{\frac {it}{\sqrt n} S_n f}\,d\mu = \int \mathcal{P}_{f,\frac{t}{\sqrt n}}^n ({\mathbf{1}})\,d\mu, \ \ \ \ \ (9)$

which accomplishes the first stage of the proof — writing the characteristic function in terms of the twisted transfer operator.

For the second stage of the proof, we consider the twisted transfer operator as a perturbation of ${\mathcal{P}_T}$. From the Lasota–Yorke inequality and the fact that ${\mu}$ is mixing, we know that the spectrum of ${\mathcal{P}_T}$ has the form ${\{1\}\cup Z}$, where ${Z}$ is contained in a disc of radius ${r<1}$ centred at the origin.

By the perturbation theory of linear operators, the spectrum of ${\mathcal{P}_{f,t}}$ has the same form for small enough ${|t|}$: there is a leading eigenvalue ${\lambda(t)}$ that is close to ${1}$, and the rest of the spectrum is contained in the disc of radius ${r}$. Moreover, the leading eigenvalue satisfies (Edit: see the end of the post for a proof)

$\displaystyle \lambda'(0) = \int (if) \,d\mu = 0$

and

$\displaystyle \lambda''(0) = \lim_{n\rightarrow\infty} \frac 1n \int (S_n(if))^2\,d\mu = -\sigma^2,$

which is the origin of the expression in the Green–Kubo formula.

Now we use the Riesz functional calculus, whose general ideas we briefly recall here. Let ${X}$ be a Banach space and ${\mathcal{B}(X)}$ the space of bounded linear operators on ${X}$. Given ${S\in \mathcal{B}(X)}$, let ${\sigma\subset{\mathbb C}}$ be the spectrum of ${S}$. Then there is a unique way to associate to each analytic function ${g\colon \sigma\rightarrow{\mathbb C}}$ an operator ${g(S)}$ such that the map ${g\mapsto g(S)}$ is a homomorphism mapping the constant function to the identity operator and the identity function to ${S}$.

This mapping can be defined by integrating around a curve ${\gamma}$ surrounding the spectrum ${\sigma}$ (this is similar to the Cauchy formula from complex analysis):

$\displaystyle g(S) = \frac 1{2\pi i} \int_\gamma (S-zI)^{-1} g(z)\,d\gamma,$

where we recall that ${S-zI}$ is invertible for all ${z}$ in the resolvent ${{\mathbb C}\setminus\sigma}$. If we take ${g}$ to be the characteristic function of part of the spectrum, we obtain a projection to the eigenspace associated with that part.

In particular, considering the operator ${\mathcal{P}_{f,t}}$, we may set ${g = {\mathbf{1}}_{\lambda(t)}}$ and obtain a projection ${\Pi_t}$ onto the eigenspace of ${\lambda(t)}$. Similarly, setting ${g(z)=z{\mathbf{1}}_{Z(t)}(z)}$, where ${Z(t)}$ is the part of the spectrum contained in a disc of radius ${r<1}$, we get an operator ${R_t}$ such that

$\displaystyle \Pi_t R_t = R_t\Pi_t = 0, \qquad \|R_t\|_{BV} < r.$

Moreover, we have

$\displaystyle \mathcal{P}_{f,t} = \lambda(t) \Pi_t + R_t,$

which allows us to write the operator in (9) as

\displaystyle \begin{aligned} \mathcal{P}_{f,\frac{t}{\sqrt n}}^n &= \lambda \left( \frac t{\sqrt n}\right)^n \Pi_{\frac t{\sqrt n}} + R_{\frac t{\sqrt n}}^n \\ &= \left( 1 + \lambda'(0) \frac t{\sqrt n} + \frac{\lambda''(0)}2 \frac {t^2} n + o\left( \frac {t^2} n\right) \right)^n \Pi_\frac t{\sqrt n} + R_\frac t{\sqrt n}^n \\ &= \left( 1 - \frac{\sigma^2 t^2}{2n} +o\left( \frac {t^2} n\right) \right)^n\left(\Pi_0 + O\left( \frac t{\sqrt n}\right)\right) + R_\frac t{\sqrt n}^n \\ &\xrightarrow{n\rightarrow\infty} e^{-t^2\sigma^2/2}\Pi_0, \end{aligned}

using the fact that ${\|R_t\|_{BV} < r < 1}$. Now (9) yields

$\displaystyle \varphi_n(t) \rightarrow e^{-t^2\sigma^2/2} \int \Pi_0({\mathbf{1}})\,d\mu = e^{-t^2\sigma^2/2} = \psi(t),$

which completes the proof of the CLT.

3. Proof of formulas for derivatives of ${\lambda}$

The formulas given above for ${\lambda'(0)}$ and ${\lambda''(0)}$ were not explained. Here is a derivation of these formulas.

Let ${g_t}$ be the eigenfunction of ${\mathop{\mathcal P}_{f,t}}$ corresponding to the eigenvalue ${\lambda(t)}$. That is, ${g_t}$ satisfies

$\displaystyle \mathcal{P}_T(e^{itf} g_t) = \lambda(t) g_t.$

Multiplying by a test function ${h}$ and integrating against ${\mu}$ gives

$\displaystyle \int \mathcal{P}_T(e^{itf} g_t) h \,d\mu = \lambda(t) \int g_t h \,d\mu.$

Recalling the definition of ${\mathcal{P}_T}$, this gives

$\displaystyle \int (e^{itf} g_t) (h\circ T) \,d\mu = \lambda(t) \int g_t h \,d\mu. \ \ \ \ \ (10)$

Let ${g'_t = \frac d{dt} g_t}$ and ${g''_t = \frac {d^2}{dt^2} g_t}$. Then differentiating (10) with respect to ${t}$ gives

$\displaystyle \int (if) (e^{itf} g_t) (h\circ T) \,d\mu + \int (e^{itf} g'_t)(h\circ T)\,d\mu = \lambda'(t) \int g_t h\,d\mu + \lambda(t) \int g_t' h\,d\mu. \ \ \ \ \ (11)$

Setting ${t=0}$ and using the fact that ${\lambda(0)=1}$ and ${g_0\equiv 1}$, we get

$\displaystyle \int(if)(h\circ T) \,d\mu + \int (q)(h\circ T) \,d\mu = \lambda'(0) \int h\,d\mu + \int(q) (h)\,d\mu,$

where we write ${q = \frac d{dt}g_t|_{t=0}}$.

Putting ${h\equiv 1}$ gives the expression for ${\lambda'(0)}$. Before finding ${\lambda''(0)}$, we observe that the above equation can also be used to find ${\int qh \,d\mu}$, which will be important later on. Indeed, using the assumption that ${\int f\,d\mu = 0}$, we have ${\lambda'(0)=0}$, and so the above equation becomes

$\displaystyle \int(if)(h\circ T) \,d\mu + \int (q)(h\circ T) \,d\mu = \int(q) (h)\,d\mu. \ \ \ \ \ (12)$

Similarly, replacing ${h}$ with the test functions ${h\circ T^k}$ for ${k\geq 1}$, (12) gives

\displaystyle \begin{aligned} \int(if)(h\circ T^2) \,d\mu + \int (q)(h\circ T^2) \,d\mu &= \int(q) (h\circ T)\,d\mu, \\ \int(if)(h\circ T^3) \,d\mu + \int (q)(h\circ T^3) \,d\mu &= \int(q) (h\circ T^2)\,d\mu, \end{aligned}

and so on. Observe that ${\sum_{k\geq 1} (q)(h\circ T^k)\,d\mu}$ converges because we have exponential decay of correlations. Thus we may add the above equations (infinitely many of them) and subtract this sum from both sides to obtain

$\displaystyle \int qh\,d\mu = \sum_{k\geq 1} \int (if) (h\circ T^k)\,d\mu. \ \ \ \ \ (13)$

Now we can find the expression for ${\lambda''(0)}$. Set ${h\equiv 1}$ in (11) and differentiate to get

$\displaystyle \int \left((if)^2 (e^{itf} g_t) + 2(if) (e^{itf} g'_t) + e^{itf} g''_t)\right)\,d\mu = \int (\lambda''(t) g_t + 2\lambda'(t) g'_t + \lambda(t) g''_t)\,d\mu. \ \ \ \ \ (14)$

At ${t=0}$ we see that the terms containing ${g_t''}$ are equal, while ${\lambda'(0)=0}$ by the assumption that ${\int f\,d\mu = 0}$, and so (14) gives

$\displaystyle \int (-f^2)\,d\mu + 2\int (if)(q) \,d\mu = \lambda''(t). \ \ \ \ \ (15)$

From (13), we have

$\displaystyle \int(if)(q)\,d\mu = \sum_{k\geq 1} \int(if) (if\circ T^k)\,d\mu,$

which together with (15) suffices to complete the proof of the expression for ${\lambda''(0)}$.

I'm an assistant professor of mathematics at the University of Houston. I'm interested in dynamical systems, ergodic theory, thermodynamic formalism, dimension theory, multifractal analysis, non-uniform hyperbolicity, and things along those lines.
This entry was posted in Uncategorized. Bookmark the permalink.

### 6 Responses to Spectral methods 3 – central limit theorem

1. Pengfei says:

I like this post! A few typos: missing a integral in (6); the function $h$ in (7); the density of $\mu$ might be $\phi$. How do you calculate $\lambda''(0)$? I would get $\lambda''(0)=-\langle f^2\rangle$, which is wrong.

• Thanks for pointing those out, I think I’ve fixed the typos now…

2. Pengfei says:

Here is my calculation: suppose $\lambda(t)\cdot g_t=\mathcal{P}_{f,t}(g_t)=\mathcal{P}_T(e^{itf}g_t)$. Take integral with respect to $\mu$: $\lambda(t)\cdot\int g_t d\mu=\int e^{itf}g_t d\mu$. Then take derivative $\frac{d}{dt}|_0$: $\lambda'(t)\int g_t d\mu+\lambda(t)\int \frac{d}{dt}|_0 g_t d\mu=\int e^{itf} if\cdot g_t d\mu+\int e^{itf}\cdot\frac{d}{dt}|_0 g_t d\mu$. Evaluating at $t=0$ and using $\lambda(0)=1$ and $g_0$ is constant, we get $\lambda'(0)=\int ifd\mu=0$. Taking second derivate, I only get $\lambda''(0)=\int -f^2d\mu$. I don’t know what goes wrong

• It’s the right idea but you need to be careful — you can’t evaluate at t=0 until you’ve done all your differentiating, including the second time. When you do that there’s an extra integal that you have to compute, which provides the missing terms. I’ve added a section to the post clarifying the matter.

• Pengfei says:

I see where my calculation went wrong. Thank you!

3. Pengfei says: