In the dynamics seminar here at Houston, we’re beginning a series of expository talks on statistical properties of dynamical systems. This week’s talk was given by Andrew Török and introduces some of the spectral methods for transfer operators that prove useful. This post is basically a set of notes from that talk, aiming to give an informal and accessible introduction to this topic.
1. Observables, invariant measures, and mixing
We consider dynamical systems that are expected to exhibit some sort of “chaotic” behaviour. Here is generally a compact metric space (today we’ll mostly consider ) and is continuous, or at least piecewise continuous.
A measurable function is called an “observable”. The idea is to consider the time series of functions , which for “chaotic” systems resembles a sequences of random variables, with representing the observation made at time . But just how random is this sequence? Is there a sense in which these random variables can really be treated as random, independent, uncorrelated?
This is the central question in studying statistical properties of dynamical systems. There are various results that hold for sequences of independent identically distributed (IID) random variables, and it turns out that these same results hold for many systems of interest.
Let’s make this a little more precise. If we flip a coin repeatedly (a fundamental example of an IID sequence), the strong law of large numbers says that “almost surely” (that is, with probability 1), the fraction of flips that turn up heads will converge to . One also has the central limit theorem, which says that with the appropriate averaging, the probability distribution that governs the number of heads that appear will converge to a normal distribution.
The sequence of observations is certainly not IID — there are strong correlations between and for small values of . However, there are many examples where this correlation decays as . For such examples it is reasonable to ask if probabilistic results such as the strong law of large numbers and the central limit theorem hold.
In order to make probabilistic statements — that is, to treat as a probability space with a sequence of random variables — we must first specify what measure we place on . Let be a Borel probability measure on . The measure is said to be invariant if for every Borel . If we interpret as the event (occuring at time ), then the requirement of invariance amounts to the condition that the probability of the event is the same at time (that is, ) as it is at any later time (that is, ). Thus the sequence of random variables on is identically distributed, and we only need to worry about the fact that independence fails.
An equivalent definition of invariance is that for every . Informally, this means that the expected value of the observables are all the same — one obtains the same expected value for an observation whether it is made at time (that is, ) or at some later time (that is, ). This duality between definitions in terms of sets and definitions in terms of functions will continue to appear throughout our discussion.
In fact, not much more is needed to obtain the first probabilistic statement, the strong law of large numbers. An invariant measure is a ergodic if any (and hence all) of the following three equivalent conditions hold:
- every invariant set () has measure 0 or 1;
- every invariant function () is constant -a.e.;
- cannot be written as a convex combination of two other invariant measures.
The strong law of large numbers holds for any ergodic measure and integrable observable — this is the Birkhoff ergodic theorem.
What about the central limit theorem? For this we need something more. Consider two events . If the state of the system at time was completely independent of the state at time , then we would have . Because we consider deterministic systems in which there are short-term correlations, we do not usually have equality in this relation — nevertheless we can ask for it to hold asymptotically, so that
A measure satisfying (1) is called mixing. An important part of establishing the statistical behaviour of a system with respect to this measure is to understand the rate of mixing — how quickly do the correlations decay?
for every . It turns out that this is a more useful formulation when we wish to compute rates of mixing. The idea is that the convergence in (2) happens exponentially quickly for observables taken from a “reasonably nice” function space, while for more general classes of observables (such as arbitrary elements of ), and also for the setwise-defined mixing in (1), the convergence can be arbitrarily slow.
For the time being, then, we can describe our goal as follows. The dynamical system may have a great many invariant measures, and by restricting our attention to the ergodic measures, we guarantee that the strong law of large numbers holds thanks to the Birkhoff ergodic theorem. Within the class of ergodic measures, we want to find measures for which the correlations between the observables decay exponentially, and for which a central limit theorem (and hopefully other statistical laws) can be proved.
So far there has been no mention of anything “spectral”, despite the title at the top. The comment above, that we will ultimately need to restrict to some “reasonably nice” function space, suggests how it will enter. To the map is associated an induced operator on various function spaces. We will study the spectral properties of this operator, and by choosing an appropriate function space, we will be able to use these properties to deduce various statistical properties.
First, though, it is high time for some examples. Both of our chief examples at the present time are defined on the unit interval . The first example is the doubling map shown in Figure 1. This map preserves Lebesgue measure and has everywhere. (More precisely, almost everywhere, since is discontinuous at .)
Fig 1 The doubling map .
The second, more general, example, is a piecewise expanding interval map such as the one shown in Figure 2. Here again , but now comes from a more general class — the interval is partitioned into finitely many intervals, and on each of these the map is with . (For the reader familiar with such maps, note that there is no Markov assumption.)
Fig 2 A piecewise expanding interval map.
Lebesgue measure is a natural choice for the first example: it is invariant, it can be shown to be ergodic, and when we speak of “choosing a point at random from the interval”, the most natural interpretation is that we mean at random with respect to Lebesgue measure.
What invariant measure should we use for the second example? The map certainly appears to be a plausible candidate for chaotic behaviour — nearby points are driven apart exponentially fast — but in order to describe the statistical properties of the system we need an invariant measure. In particular, we would like an invariant measure that is absolutely continuous with respect to Lebesgue measure, so that a result that is true for -a.e. point will also be true for Lebesgue-a.e. . So how do we find an absolutely continuous invariant probability measure? (The abbreviation acim or acip is often used.)
Let denote the set of absolutely continuous probability measures on , not necessarily invariant. Elements of correspond to functions by the Radon–Nikodym theorem: the association is . We want to understand the maps that the dynamics induces on both the space of measures and the space of densities — invariant measures (densities) will correspond to fixed points of this map.
3. The transfer operator
To understand the dynamics that induces on , first note that the map induces an action on observables by sending to — this is called the Koopman operator or composition operator and is an important tool in ergodic theory (once we have an invariant measure). So far, however, we have no invariant measure. Note that a measure given by is invariant if and only if for every . In particular, is invariant if and only if the corresponding density is a fixed point of the operator defined on by the condition
The operator is called the transfer operator (or Ruelle operator), and will be the central object of the spectral methods we are describing. We can view as the action induced by on the space of absolutely continuous measures. Note that in order for to be a bounded linear operator on , the map must be non-singular — that is, it cannot take a set of positive measure to a set of measure 0.
The utility of for us is two-fold: in the first place, as pointed out above, absolutely continuous invariant measures correspond to fixed points of — that is, eigenfunctions of with eigenvalue 1. Furthermore, by iterating (3), the correlation functions on the left-hand side of (2) can be understood in terms of the iterates , and these in turn can be understood in terms of spectral properties of apart from the eigenvalue 1. This is how knowledge of the spectrum of will lead to information on decay of correlations and other statistical properties: the eigenfunction corresponding to the largest eigenvalue is the density of the absolutely continuous invariant measure, and the presence of a spectral gap (defined below) between this eigenvalue and smaller eigenvalues leads to exponential decay of when .
4. Decay of correlations for the doubling map
To illustrate what happens, let us return to the example of the doubling map . Here we can write (4) explicitly as
First we observe that the constant function satisfies , which corresponds to the fact that Lebesgue measure itself is invariant for . So how do we carry out the rest of the programme and use the iterates of to establish a rate of convergence in (2)?
At this point we must confront openly a point that was mentioned in passing earlier. If we consider as an operator on , then there is not too much we can do. (More on this later, perhaps.) However, if we make a “good” choice of a Banach space on which acts, then we will be able to get good results. In some sense this is the central challenge of the spectral method for studying statistical properties of dynamical systems: to find a good Banach space on which the transfer operator acts with a spectral gap. While we will see momentarily that this is not too difficult for the interval maps we are considering here, it turns out to be a much more substantial challenge when we study more general classes of systems.
In the present setting we may consider the subspace of all Lipschitz continuous functions on . There is a natural semi-norm
which fails to be a norm only because it vanishes on all constant functions. We can define a true norm by
For our purposes, the key property of the semi-norm is that it is contracted by the operator .
Proof: Using (5) we have
where we write , , and similarly for , . Now and so
This decomposition is invariant under the action of : indeed, is invariant because , and to see that is invariant we use (3) to observe that if , then
It is not hard to show that any satisfies , because its range has diameter and contains in its convex hull.
where the second equality uses the fact that . We estimate using Proposition 1:
and conclude that for any and we have
This means that for the doubling map, the convergence in (2) happens exponentially quickly provided the observables are sufficiently regular.
5. Spectral properties
The above results for the doubling map are a particular case of the spectral method, although we did not yet describe explicitly the role of the spectrum of . Recall that the spectrum of the operator is the set
which contains (but is not necessarily equal to) the set of eigenvalues of (the point spectrum). We emphasise that this is a very general definition, valid for any bounded linear operator on any Banach space, not just acting on . A basic fact in functional analysis is that the spectrum is always compact and non-empty.
In the example above, the constant function is an eigenfunction with eigenvalue 1, and using this invariant decomposition from (8), we have . That is, apart from the eigenvalue at 1, the spectrum of is determined by its action on the subspace .
To determine the spectrum of we can use either the Lipschitz norm or the semi-norm , because on the subspace the semi-norm becomes a norm and the two are equivalent:
(This fails outside of , where to apply (10) we would need to use .) From Proposition 1 and (10) we see that . Thus the spectrum of has the structure shown in Figure 3: there is a single eigenvalue at 1, and the rest of the spectrum is contained in the disc with centre 0 and radius 1/2.
Fig 3 A gap in the spectrum of .
Going beyond the doubling map to such examples as the piecewise expanding interval maps discussed above, the goal is to carry out a similar procedure by finding a suitable Banach space of functions on which the transfer operator acts with a spectral gap: that is, where there is a single eigenvalue (or at most finitely many) lying on the unit circle, and the rest of is contained in a disc of radius . Then one is able to draw the following conclusions.
- The eigenfunction(s) corresponding to the eigenvalue 1 are the densities for the absolutely continuous invariant measures.
- Given any , there is a constant such that , and so the correlations decay like when the observables and are chosen from suitable function spaces.
Eventually it is also interesting to consider a more general class of transfer operators associated to potential functions for which the largest eigenvalue may not be 1, but first we will spend some time (in the next few posts) developing the general theory of how one can prove the existence of a spectral gap for piecewise expanding interval maps.