These are notes for a talk I am giving in Jon Chaika’s online working seminar in ergodic theory. The purpose of the talk is to outline Bowen’s proof of uniqueness of the measure of maximal entropy for shift spaces with the specification property. Bowen’s approach extends much more broadly than this: to non-symbolic systems (assuming expansivity); to equilibrium states for non-zero potential functions (assuming a bounded distortion property); and to non-uniformly hyperbolic systems using the notion of “obstructions to specification and expansivity” developed by Dan Thompson and myself (see some notes here and here, and videos here). In these notes, though, I want to give the bare bones of the argument in the simplest possible setting, to make the essential structure as clear as possible. I gave an alternate argument in a previous post; here I am giving Bowen’s original argument, although I do not necessarily follow his presentation. I should also point out that this argument differs from the construction of the MME, or more generally the equilibrium state, in Bowen’s monograph, which uses the Ruelle operator.
1. Setting and result
Let be a finite set; the alphabet. Then the full shift is the set of infinite sequences of symbols from ; this is a compact metric space with , where . The shift map is defined by . A shift space is a closed set with .
Define for all . This is a topological Markov shift (TMS). It can be viewed in terms of a directed graph with vertex set and edges given by (1): consists of all sequences that label infinite paths on the graph.
The TMS is mixing or primitive if there is such that for all . Equivalently, the graph is strongly connected and the set of loop lengths on the graph has gcd .
Given a shift space , consider
The set of invariant measures is extremely large for a mixing TMS (and more generally for systems with some sort of hyperbolic behavior), and it is important to identify “distinguished” invariant measures. One way of doing this is via the variational principle
The next section recalls the definitions of the topological and measure-theoretic entropies in this setting. A measure achieving the supremum is a measure of maximal entropy (MME).
We will see that every mixing TMS has a unique MME, via a more general result. Given and , let be the set of sequences in that start with the word (juxtaposition denotes concatenation); call this the cylinder of . Define the language of by
Definition 1 has specification if there is such that for all there is such that .
Exercise 1 Prove that every mixing TMS has specification.
2.1. Topological entropy
Every word in is of the form for some and ; thus
Exercise 2 Prove Fekete’s lemma: for any sequence satisfying (2), exists and is equal to (a priori it could be ).
We conclude that the topological entropy exists for every shift space. This quantifies the growth rate of the total complexity of the system.
Exercise 3 Prove that is the box dimension of (easy). Then prove that it is also the Hausdorff dimension of (harder). (Both of these facts rely very strongly on the fact that is globally constant whenever are close; for general systems where the amount of expansion may vary, the definition of topological entropy is more involved and the relationship to dimension is more subtle, although it is worth noting that a 1973 paper of Bowen in TAMS gives a definition of topological entropy that is analogous to Hausdorff dimension.)
2.2. Measure-theoretic entropy
Recall the motivation from information theory for the definition of entropy: given , let . This can be interpreted as the information associated to an event with probability ; note that it is monotonic (the less likely an event is, the more information we gain by learning that it happened) and that .
Now define on by (and ); this can be interpreted as the expected amount of information associated to an event with probability .
Exercise 4 Show that is concave.
this can be interpreted as the expected information associated to a collection of mutually exclusive events with probabilities .
where the first line is by definition, the second is since , the third uses concavity of , and the fourth uses invariance of to get .
This lemma has the following intuitive interpretation: the expected information from the first symbols is at most the expected information from the first symbols plus the expected information from the next symbols.
2.3. Variational principle
Using Exercise 5 we see that , with equality if and only if for all . This immediately proves that
The following makes this more precise.
Definition 4 is a Gibbs measure if there are such that for all and , we have .
Now Theorem 2 is a consequence of the following two results, which we prove below.
In fact the construction of below always gives equality in (6), without relying on the specification property (or obtaining uniqueness), but we will not prove this.
2.4. Convex combinations
Then we use monotonicity of to get
Applying Lemma 7 to the probability vectors associated to two measures , we see that
Remark 1 It is worth mentioning the deeper fact that a version of (10) holds for infinite convex combinations (even uncountable ones given by an integral); this is due to Konrad Jacobs, see Section 9.6 of “Foundations of Ergodic Theory” by Viana and Oliveira.
3. Gibbs implies uniqueness
To prove Theorem 5, start by observing that the lower Gibbs bound gives
and thus . Meanwhile, the upper Gibbs bound gives
and thus , so we conclude that . It remains to show that every with has .
First observe that by (10), we can restrict our attention to the case when . Indeed, given any , the Lebesgue decomposition theorem gives for some and , and (10) gives . By ergodicity we must have and thus , so if then the same is true of .
Let denote the normalization of after restricting to words in , and similarly for . Recall from Fekete’s lemma and subadditivity of that for all . Then we get
and as the right-hand side goes to by (12), which implies that , completing the proof.
4. Specification implies Gibbs
Now we outline the proof of Theorem 6. This comes in three steps: (1) uniform counting bounds; (2) construction of a Gibbs measure; (3) proof of ergodicity.
4.1. Uniform counting bounds
From now on we write for convenience. Fekete’s lemma gives for all , or equivalently . This can also be deduced by writing
and sending so that the left-hand side goes to (this is basically part of the proof of Fekete’s lemma).
To get an upper bound on we need the specification property, which gives a 1-1 map , so that . Then one can either apply Fekete’s lemma to , or observe that
4.2. A Gibbs measure
There is a standard procedure for constructing an MME: let be any sequence of (not necessarily invariant) Borel probability measures such that for all , and then put
Since is weak* compact, there is a weak* convergent subsequence .
Exercise 6 Show that is -invariant ().
The preceding exercise is basically the proof of the Krylov-Bogolyubov theorem, and does not require any properties of the measures beyond the fact that they are Borel probability measures.
One can prove that is an MME whether or not has specification, but we will use specification to directly prove the stronger Gibbs property.
Given any and any , we bound by estimating how many words in are of the form for some and . Arguments similar to those in the uniform counting bounds show that
where the first inequality requires specification. Dividing by and using the uniform counting bounds (13) gives
so is a Gibbs measure.
To prove that is ergodic, start by fixing , with lengths and , respectively. Given and , follow the same procedure as above to estimate the number of words in with the form , where , , and , and obtain the bounds
Averaging over , sending , and using the uniform counting bounds (13) gives
Using the Gibbs bounds (15) gives
Exercise 7 Given Borel sets , approximate and with cylinders and use (16) to get
Remark 2 In fact, the upper bound in Exercise 7 can be used to show that is mixing; see Proposition 20.3.6 in Katok and Hasselblatt.