These are notes for a talk I am giving in Jon Chaika’s online working seminar in ergodic theory. The purpose of the talk is to outline Bowen’s proof of uniqueness of the measure of maximal entropy for shift spaces with the specification property. Bowen’s approach extends much more broadly than this: to non-symbolic systems (assuming expansivity); to equilibrium states for non-zero potential functions (assuming a bounded distortion property); and to non-uniformly hyperbolic systems using the notion of “obstructions to specification and expansivity” developed by Dan Thompson and myself (see some notes here and here, and videos here). In these notes, though, I want to give the bare bones of the argument in the simplest possible setting, to make the essential structure as clear as possible. I gave an alternate argument in a previous post; here I am giving Bowen’s original argument, although I do not necessarily follow his presentation. I should also point out that this argument differs from the construction of the MME, or more generally the equilibrium state, in Bowen’s monograph, which uses the Ruelle operator.

**1. Setting and result **

Let be a finite set; the *alphabet*. Then the *full shift* is the set of infinite sequences of symbols from ; this is a compact metric space with , where . The *shift map* is defined by . A *shift space* is a closed set with .

Example 1The best example to keep in mind through this talk is atopological Markov shift: fix and put ; then fix atransition matrixwith entries in , and writeDefine for all . This is a

topological Markov shift(TMS). It can be viewed in terms of a directed graph with vertex set and edges given by (1): consists of all sequences that label infinite paths on the graph.

The TMS ismixingorprimitiveif there is such that for all . Equivalently, the graph is strongly connected and the set of loop lengths on the graph has gcd .

Given a shift space , consider

The set of invariant measures is extremely large for a mixing TMS (and more generally for systems with some sort of hyperbolic behavior), and it is important to identify “distinguished” invariant measures. One way of doing this is via the *variational principle*

The next section recalls the definitions of the topological and measure-theoretic entropies in this setting. A measure achieving the supremum is a *measure of maximal entropy* (MME).

We will see that every mixing TMS has a unique MME, via a more general result. Given and , let be the set of sequences in that start with the word (juxtaposition denotes concatenation); call this the *cylinder* of . Define the *language* of by

Definition 1hasspecificationif there is such that for all there is such that .

Exercise 1Prove that every mixing TMS has specification.

Theorem 2 (Bowen)If has specification, then it has a unique MME.

**2. Entropies **

** 2.1. Topological entropy **

Every word in is of the form for some and ; thus

This means that the sequence has the *sub-additivity* property

Exercise 2ProveFekete’s lemma: for any sequence satisfying (2), exists and is equal to (a priori it could be ).

We conclude that the *topological entropy* exists for every shift space. This quantifies the growth rate of the total complexity of the system.

Exercise 3Prove that is the box dimension of (easy). Then prove that it is also the Hausdorff dimension of (harder). (Both of these facts rely very strongly on the fact that is globally constant whenever are close; for general systems where the amount of expansion may vary, the definition of topological entropy is more involved and the relationship to dimension is more subtle, although it is worth noting that a 1973 paper of Bowen in TAMS gives a definition of topological entropy that is analogous to Hausdorff dimension.)

** 2.2. Measure-theoretic entropy **

Recall the motivation from information theory for the definition of entropy: given , let . This can be interpreted as the information associated to an event with probability ; note that it is monotonic (the less likely an event is, the more information we gain by learning that it happened) and that .

Now define on by (and ); this can be interpreted as the *expected* amount of information associated to an event with probability .

Exercise 4Show that is concave.

Given , let and be the set of sub-probability vectors with components. Define

this can be interpreted as the expected information associated to a collection of mutually exclusive events with probabilities .

Exercise 5Show that for all , with equality if and only if for all .

Given , we have for each a probability vector with components; writing for convenience, the entropy (expected information) associated to this vector is

Lemma 3

*Proof:*

where the first line is by definition, the second is since , the third uses concavity of , and the fourth uses invariance of to get .

This lemma has the following intuitive interpretation: the expected information from the first symbols is at most the expected information from the first symbols plus the expected information from the next symbols.

Now Fekete’s lemma implies that the following *measure-theoretic entropy* exists:

** 2.3. Variational principle **

Using Exercise 5 we see that , with equality if and only if for all . This immediately proves that

and suggests that in order to have equality we should look for a measure with

The following makes this more precise.

Definition 4is aGibbs measureif there are such that for all and , we have .

Now Theorem 2 is a consequence of the following two results, which we prove below.

Theorem 5If is an ergodic Gibbs measure for , then and is the unique MME.

Theorem 6If has specification, then it has an ergodic Gibbs measure.

In fact the construction of below always gives equality in (6), without relying on the specification property (or obtaining uniqueness), but we will not prove this.

** 2.4. Convex combinations **

Before embarking on the proof of Theorems 5 and 6, we establish a general property of entropy that will be important.

Lemma 7Given and with , we have

*Proof:* The first inequality follows immediately from concavity of . For the second inequality we first observe that

Then we use monotonicity of to get

where the last equality uses (9) twice; this proves (8).

Applying Lemma 7 to the probability vectors associated to two measures , we see that

for all : dividing by and sending gives

Remark 1It is worth mentioning the deeper fact that a version of (10) holds for infinite convex combinations (even uncountable ones given by an integral); this is due to Konrad Jacobs, see Section 9.6 of “Foundations of Ergodic Theory” by Viana and Oliveira.

**3. Gibbs implies uniqueness **

To prove Theorem 5, start by observing that the lower Gibbs bound gives

and thus . Meanwhile, the upper Gibbs bound gives

and thus , so we conclude that . It remains to show that every with has .

First observe that by (10), we can restrict our attention to the case when . Indeed, given any , the Lebesgue decomposition theorem gives for some and , and (10) gives . By ergodicity we must have and thus , so if then the same is true of .

Now consider . Then there is a Borel set with and , and this in turn gives such that

Let denote the *normalization* of after restricting to words in , and similarly for . Recall from Fekete’s lemma and subadditivity of that for all . Then we get

where the second line uses Lemma 7, the third line uses Exercise 5, and the fourth line uses the lower Gibbs bound as in (11). We conclude that

and as the right-hand side goes to by (12), which implies that , completing the proof.

**4. Specification implies Gibbs **

Now we outline the proof of Theorem 6. This comes in three steps: (1) uniform counting bounds; (2) construction of a Gibbs measure; (3) proof of ergodicity.

** 4.1. Uniform counting bounds **

From now on we write for convenience. Fekete’s lemma gives for all , or equivalently . This can also be deduced by writing

and sending so that the left-hand side goes to (this is basically part of the proof of Fekete’s lemma).

To get an upper bound on we need the specification property, which gives a 1-1 map , so that . Then one can either apply Fekete’s lemma to , or observe that

Sending the left-hand side goes to , and so combined with the lower bound above we get the uniform counting bounds

** 4.2. A Gibbs measure **

There is a standard procedure for constructing an MME: let be any sequence of (not necessarily invariant) Borel probability measures such that for all , and then put

Since is weak* compact, there is a weak* convergent subsequence .

Exercise 6Show that is -invariant ().

The preceding exercise is basically the proof of the Krylov-Bogolyubov theorem, and does not require any properties of the measures beyond the fact that they are Borel probability measures.

One can prove that is an MME whether or not has specification, but we will use specification to directly prove the stronger Gibbs property.

Given any and any , we bound by estimating how many words in are of the form for some and . Arguments similar to those in the uniform counting bounds show that

where the first inequality requires specification. Dividing by and using the uniform counting bounds (13) gives

using a similar estimate from below we get

Averaging over all and sending along the subsequence gives

** 4.3. Ergodicity **

To prove that is ergodic, start by fixing , with lengths and , respectively. Given and , follow the same procedure as above to estimate the number of words in with the form , where , , and , and obtain the bounds

Averaging over , sending , and using the uniform counting bounds (13) gives

Using the Gibbs bounds (15) gives

Exercise 7Given Borel sets , approximate and with cylinders and use (16) to get

Now if is invariant, then taking and in Exercise 7, the lower bound gives , so is either or . This proves ergodicity and completes the proof of Theorem 6.

Remark 2In fact, the upper bound in Exercise 7 can be used to show that is mixing; see Proposition 20.3.6 in Katok and Hasselblatt.