## Non-uniform specification for piecewise monotonic interval maps

1. Coding spaces for a class of interval maps

The starting point for this post is Franz Hofbauer’s paper “On intrinsic ergodicity of piecewise monotonic transformations with positive entropy”, which is actually two papers; both appeared in Israel J. Math, one in 1979 and one in 1981. He proves existence and uniqueness of the measure of maximal entropy (MME) for a map ${f\colon [0,1]\to [0,1]}$ provided the following are true:

1. there are disjoint intervals ${J_1,\dots, J_d}$ such that ${\bigcup_i J_i = [0,1]}$ and ${f|_{J_i}}$ is continuous and strictly monotonic;
2. the partition into these intervals is generating — that is, ${\bigcup_{m=0}^\infty f^{-m} (\partial J)}$ is dense in ${[0,1]}$, where ${\partial J}$ denotes the set of all endpoints of the intervals ${J_i}$;
3. ${h_{\mathrm{top}}(f)>0}$;
4. ${f}$ is topologically transitive.

Hofbauer’s proof proceeds by modeling ${f}$ using a countable-state Markov shift, and actually gives quite a bit of information even without topological transitivity, including a decomposition of the non-wandering set into transitive pieces, each of which supports at most one MME.

I want to describe an approach to these maps using non-uniform specification. These maps admit a natural symbolic coding on the finite alphabet ${A := \{1,\dots, d\}}$ obtained by tracking which interval ${J_i}$ an orbit lands in at each iterate, and this is the lens through which we will study them.

To make this more precise, suppose that ${f}$ satisfies Condition 1 (piecewise monotonicity), and let ${{\mathcal L}}$ denote the set of all words ${w\in A^* = \bigcup_{n\geq 0} A^n}$ such that the following interval is non-empty:

\displaystyle \begin{aligned} I(w) &:= J_{w_1} \cap f^{-1}(J_{w_2}) \cap \cdots \cap f^{-(|w|-1)}(J_{w_{|w|}}) \\ &= \{ x\in [0,1] : f^{k-1}(x) \in J_{w_k} \text{ for all } 1\leq k\leq |w| \}. \end{aligned} \ \ \ \ \ (1)

The letter “I” can be understood as standing for “interval” or for “initial”, in the sense that an orbit segment ${(x, f(x),\dots, f^{n-1}(x))}$ is coded by the word ${w\in A^n}$ if and only if its initial point lies in ${I(w)}$. In this case we have

$\displaystyle f^n(w) \in F(w) := f^n(I(w)), \ \ \ \ \ (2)$

where the “F” in this notation can be understood as standing for “following” (or “final” if you prefer). Observe that the intervals ${\{ I(w) : w\in {\mathcal L}_n\}}$ are precisely the intervals on which ${f^n}$ is continuous and monotonic, and thus each ${F(w)}$ is an interval as well.

Exercise 1 Suppose that ${f}$ is uniformly expanding on each ${J_i}$ in the sense that there is ${\lambda > 1}$ such that ${|f(x) - f(y)| \geq \lambda |x-y|}$ whenever ${x,y\in J_i}$, and prove that in this case Condition 2 (generating) is satisfied.

Exercise 2 Suppose that ${f}$ satisfies Conditions 1 and 2. Show that ${{\mathcal L}}$ is the language of a shift space ${X \subset A^{\mathbb N}}$, and that there is a semi-conjugacy ${\pi\colon X\to [0,1]}$ defined by

$\displaystyle \pi(z) = \bigcup_{n\geq 1} \overline{I(z_1 \cdots z_n)}.$

(The fact that this intersection is a single point relies on Condition 2.) Show that there is a countable set ${C\subset X}$ such that ${\pi|_{X\setminus C}}$ is 1-1.

Exercise 3 Show that if ${f}$ is uniformly expanding on each ${J_i}$, then Condition 3 (positive entropy) is satisfied, and in particular ${(X,\sigma)}$ has positive topological entropy ${h>0}$.

Under Conditions 1–3, there is a 1-1 correspondence between positive entropy invariant measures for ${f}$ and positive entropy invariant measures for ${(X,\sigma)}$, because the countable set ${C}$ can support only zero entropy measures. Thus to study the MMEs for ${f}$, it suffices to study the MMEs for the shift space ${X}$.

In general one cannot expect this shift space to be Markov. However, it turns out that in many cases it does have the non-uniform specification property described in my last post, which will lead to an alternate proof of intrinsic ergodicity (that is, existence of a unique MME). One of my motivations for writing out the details of this here is the hope that these techniques might eventually be applicable to more general hyperbolic systems with singularities, in particular to billiard systems; piecewise expanding interval maps provide a natural sandbox in which to experiment with the approach before grappling with the difficulties introduced by the presence of a stable direction.

To this end, the goal for this post is to exhibit collections of words ${{\mathcal G},{\mathcal C} \subset {\mathcal L}}$ with the following properties.

• Decomposition: For every ${w\in {\mathcal L}}$, there are ${u\in {\mathcal G}}$ and ${v\in {\mathcal C}}$ such that ${w=uv}$.
• Specification: There is ${\tau\in {\mathbb N}}$ such that for every ${v,w\in {\mathcal G}}$, there is ${u\in {\mathcal L}}$ with ${|u|\leq \tau}$ such that ${vuw\in {\mathcal G}}$.
• Entropy gap: ${\limsup_{n\to\infty} \frac 1n \log \# {\mathcal C}_n < h_{\mathrm{top}}(f)}$.

Once these are proved, the main theorem from the previous post guarantees existence of a unique MME. (The specification property there requires ${|u|=\tau}$, but this was merely to simplify the exposition; with a little more work one can establish the result under the weaker condition here.)

2. Exploring ${\beta}$-transformations

Rather than stating a definitive result immediately, let us go step-by-step — in particular, we will eventually add another condition on ${f}$ (beyond Conditions 1–3) to replace topological transitivity, and the initial definition of ${{\mathcal G}}$ and ${{\mathcal C}}$ will only cover certain examples, with a modification required to handle the general case.

Start by considering the ${\beta}$-transformation ${f(x) = \beta x \pmod 1}$ when ${\beta = \frac 12 (1+\sqrt{5})}$, whose first three iterates are shown in Figure 2. Labeling the branches with the alphabet ${A = \{0,1\}}$, we see that ${I(0) = [0,\frac 1\beta]}$ and ${I(1) = [\frac 1\beta,1]}$; in the figure, these intervals are marked on both the horizontal and vertical axes. Clearly ${F(0) = [0,1] = I(0) \cup I(1)}$, and for this specific choice of ${\beta}$ we have ${f(1) = \beta - 1 = \frac 1\beta}$, so ${F(1) = [0,\frac 1\beta] = I(0)}$.

Looking at the second picture in the figure, which shows the graph of ${f^2}$, we can see that

$\displaystyle F(00) = I(0) \cup I(1), \quad F(01) = I(0), \quad F(10) = I(0) \cup I(1).$

The third picture shows a similar phenomenon for ${f^3}$, with ${F(w) = I(0) \cup I(1)}$ when ${w \in \{000,010,100\}}$, and ${F(w) = I(0)}$ when ${w\in \{001,101\}}$.

Exercise 4 For this particular map ${f}$, show by induction that ${F(w) = I(0)}$ if ${w\in {\mathcal L}}$ ends in a ${1}$, and ${F(w) = I(0) \cup I(1)}$ if ${w\in {\mathcal L}}$ ends in a ${0}$. In particular, we have ${F(w) = F(w_{|w|})}$ for all ${w\in {\mathcal L}}$. Use this to deduce that ${{\mathcal L}}$ consists of precisely those words ${w\in \{0,1\}^*}$ that do not contain ${11}$ as a subword. (Thus ${X}$ is a transitive SFT and has a unique MME.)

The key step in this exercise is the observation that given ${a\in A}$ and ${w\in {\mathcal L}}$ such that ${aw\in {\mathcal L}}$, the definition of the intervals ${I}$ and ${F}$ in (1) and (2) gives

\displaystyle \begin{aligned} I(aw) &= J_a \cap f^{-1} (J_{w_1}) \cap f^{-2}(J_{w_2}) \cap \cdots \cap f^{-|w|}(J_{w_{|w|}}) \\ &= J_a \cap f^{-1}\big(J_{w_1} \cap f^{-1}(J_{w_2} \cap \cdots \cap f^{-(|w|-1)}(J_{w_{|w|}})\big) \\ &= I(a) \cap f^{-1}(I(w)), \end{aligned}

and thus

$\displaystyle F(aw) = f^{|aw|}(I(aw)) = f^{|w|}\big( f(I(a) \cap f^{-1}(I(w)) \big) = f^{|w|}(F(a) \cap I(w)).$

In particular, if ${F(a) \supset I(w)}$, then we have ${F(a) \cap I(w) = I(w)}$, and thus

$\displaystyle F(aw) = f^{|w|}(I(w)) = F(w).$

A similar argument replacing ${a}$ with a word ${v\in {\mathcal L}}$ (and the first iterate ${f}$ with ${f^{|v|}}$) gives the following lemma, whose detailed proof is left as an exercise.

Lemma 1 Given any map ${f}$ satisfying Conditions 1 and 2, and any ${v,w\in {\mathcal L}}$, we have

$\displaystyle I(vw) = I(v) \cap f^{-|v|}(I(w)) \quad\text{and}\quad F(vw) = f^{|w|}(F(v) \cap I(w)). \ \ \ \ \ (3)$

In particular, if ${F(v) \supset I(w)}$, then ${vw\in {\mathcal L}}$ and ${F(vw) = F(w)}$.

Now consider the map ${f(x) = \beta x \pmod 1}$ for ${\frac 12(1+\sqrt{5}) < \beta < 2}$; Figure 2 shows the first three iterates. Observe that we once again have ${F(0) = I(0) \cup I(1)}$ and ${F(1) \supset I(0)}$, so that the same argument as in Exercise 4 shows that if ${w\in \{0,1\}^*}$ is any word that does not contain ${11}$ as a subword, then ${w\in {\mathcal L}}$. However, this is no longer a complete description of the language, since as the second picture in Figure 2 illustrates, we have ${I(11) \neq \emptyset}$ (equivalently, ${F(1) \cap I(1) \neq \emptyset}$, as seen in the first picture) — consequently ${11\in {\mathcal L}}$.

In Exercise 4, every word ${w\in {\mathcal L}}$ had the property that ${F(w) = F(w_{|w|})}$; writing ${w = va}$ for ${v\in {\mathcal L}}$ and ${a\in A}$, we can rewrite this property as ${F(va) = F(a)}$, which by Lemma 1 occurs if and only if ${F(v) \supset I(a)}$. These two properties can be seen on Figure 2 as follows.

• ${F(va) = F(a)}$: the vertical interval spanned by the graph of ${f^{|va|}}$ on ${I(va)}$ is the same as the vertical interval spanned by the graph of ${f}$ on ${I(a)}$.
• ${F(v) \supset I(a)}$:
on the horizontal interval ${I(v)}$, the graph of ${f^{|v|}}$ crosses the vertical interval ${I(a)}$ completely.

Looking at Figure 2, we see that now these properties are satisfied for some words, but not for all. Let us write

\displaystyle \begin{aligned} {\mathcal G} &= \{ va \in {\mathcal L} : v\in {\mathcal L}, a\in A, F(va) = F(a) \} \\ &= \{va \in {\mathcal L} : v\in {\mathcal L}, a\in A, F(v) \supset I(a) \}. \end{aligned} \ \ \ \ \ (4)

Exercise 5 Referring to Figure 2, prove that ${{\mathcal G}_2 = \{00,01,10\}}$ and that ${{\mathcal G}_3 = \{000,001,010,100,101\}}$. In particular, ${11}$, ${011}$, and ${110}$ lie in ${{\mathcal L}}$ but not in ${{\mathcal G}}$. Show that on the other hand, ${1100 \in {\mathcal G}}$ (at least for the value of ${\beta}$ illustrated), so that words in ${{\mathcal G}}$ can have ${11}$ as a subword.

The notation ${{\mathcal G}}$ suggests that we are going to try to prove the specification property for this collection of words. The first step in this direction is to prove the following Markov-type property.

Lemma 2 Let ${f}$ satisfy Conditions 1 and 2, and let ${{\mathcal G}}$ be given by (4). If ${v,w\in {\mathcal G}}$ have the property that ${v_{|v|} = w_1}$, then ${v.w\in {\mathcal G}}$, where ${v.w := v_1 \cdots v_{|v|} w_2 w_3 \cdots w_{|w|}}$ (concatenation with the repeated symbol only appearing once).

Proof: Write ${v = ua}$ where ${u\in {\mathcal L}}$ and ${a\in A}$, so that ${v.w = uw}$. By Lemma 1, we have

$\displaystyle F(v.w) = F(uw) = f^{|w|}(F(u) \cap I(w)).$

Since ${v = ua \in {\mathcal G}}$ and ${w_1 = a}$, we have ${F(u) \supset I(a) \supset I(w)}$, so ${F(u) \cap I(w) = I(w)}$ and we conclude that

$\displaystyle F(v.w) = f^{|w|}(I(w)) = F(w) = F(w_{|w|}),$

where the last equality uses the fact that ${w\in {\mathcal G}}$. This shows that ${v.w\in {\mathcal G}}$. $\Box$

In the specific case of a ${\beta}$-transformation with ${\frac 12(1+\sqrt{5}) < \beta < 2}$, we can refer to Figure 2 and Exercise 5 to see that ${{\mathcal G}}$ contains the words ${00}$, ${01}$, ${10}$, and ${101}$. In particular, for any ${v,w\in {\mathcal G}}$, we can choose ${u}$ to be the one of these four words that begins with the last symbol in ${v}$, and ends with the first symbol in ${w}$. Then ${u\in {\mathcal G}}$ and ${|u|\leq 3}$; applying Lemma 2 twice gives ${v.u.w\in {\mathcal G}}$, which proves the specification property.

3. Obtaining a specification property

In the general case of an interval map satisfying Conditions 1 and 2, we would like to play the same game, proving specification for ${{\mathcal G}}$ by mimicking the proof of specification for a topologically transitive SFT: given ${a,b\in A}$, we would like to let ${u=u(a,b)\in {\mathcal G}}$ be such that ${u_1 = a}$ and ${u_{|u|} = b}$; then given any ${v,w\in {\mathcal G}}$, we can put ${u=u(v_{|v|},w_1)}$ and apply Lemma 2 twice to get ${v.u.w\in {\mathcal G}}$. Since ${A}$ is finite, this would prove that ${{\mathcal G}}$ has specification with ${\tau = \max\{ |u(a,b)| : a,b\in A \}}$.

But how can we guarantee that such a ${u(a,b)}$ exists for every ${a,b\in A}$? This does not follow from Conditions 1–3, and indeed one can easily produce examples satisfying these conditions for which there are multiple MMEs and the collection ${{\mathcal G}}$ defined in (4) does not have specification, because there are some pairs ${a,b}$ for which no such ${u}$ exists.

At this point it is tempting to simply add existence of the required words ${u(a,b)}$ as an extra condition, replacing topological transitivity. But it is worth working a little harder. Consider the map ${f(x) = \alpha + \beta x \pmod 1}$, three iterates of which are illustrated in Figure 3 for ${\alpha = 0.1}$ and ${\beta = 1.4}$.

For this ${f}$, we see from the first picture that neither of ${F(0)}$ or ${F(1)}$ contains ${I(0)}$, and thus there can be no word ${v\in {\mathcal L}}$ such that ${F(v) \supset I(0)}$, since we always have ${F(v) \subset F(v_{|v|})}$. It follows that ${{\mathcal G}}$ does not contain any words ending in ${0}$, which rules out a proof of specification along the lines above.

The story is different if instead of single letters, we look at words of length two. Observe from the second picture that

$\displaystyle F(00) \supset I(01) \cup I(10), \quad F(01)\supset I(00), \quad F(10) \supset F(01). \ \ \ \ \ (5)$

This suggests that we should modify the definition of ${{\mathcal G}}$ in (4) to require that the follower set depends only on the last two symbols (instead of the last one). More generally, we can fix ${k\in {\mathbb N}}$ and define

\displaystyle \begin{aligned} {\mathcal G}^{(k)} &= \{ vq \in {\mathcal L} : v,q\in {\mathcal L}, |q|=k, F(vq) = F(q) \} \\ &= \{vq \in {\mathcal L} : v,q\in {\mathcal L}, |q|=k, F(v) \supset I(q) \}. \end{aligned} \ \ \ \ \ (6)

We have the following generalization of Lemma 2.

Lemma 3 If ${vq,wr\in {\mathcal G}^{(k)}}$ (with ${|q| = |r| = k}$) have the property that ${w_1 \cdots w_k = q}$, then ${vwr\in {\mathcal G}^{(k)}}$.

Proof: Using Lemma 1, we have

$\displaystyle F(vwr) = f^{|wr|}(F(v) \cap I(wr)),$

and since ${vq\in {\mathcal G}^{(k)}}$, we have ${F(v) \supset I(q) \supset I(wr)}$, so ${F(v) \cap I(wr) = I(wr)}$, giving

$\displaystyle F(vwr) = f^{|wr|}(I(wr)) = F(wr) = F(r)$

since ${w\in {\mathcal G}}$. This shows that ${vwr\in {\mathcal G}}$. $\Box$

Now we can prove that ${{\mathcal G}^{(2)}}$ has specification for the map ${f}$ in Figure 3; as described above, it suffices to prove that if ${q,r}$ are any two words from the set ${\{00,01,10\}}$, then there is some ${qur\in {\mathcal G}^{(2)}}$. But this follows quickly from the fact that the transitions ${00\to 10 \to 01 \to 00}$ are all allowed by (5), and that any word obtained by concatenating allowable transitions must lie in ${{\mathcal G}^{(k)}}$ by iterating Lemma 3.

This discussion motivates the following.

Definition 4 Given ${k\in {\mathbb N}}$, define a directed graph whose vertices are the words in ${{\mathcal L}_k}$ and whose edges are given by the condition that ${q\to r}$ exactly when ${F(q) \supset I(r)}$. Say that ${f}$ has Property ${(T_k)}$ if this graph is strongly connected, meaning that given any two ${q,r\in {\mathcal L}_k}$, there is a path in the graph from ${q}$ to ${r}$.

Property ${(T_k)}$ is playing the role of topological transitivity, but the exact relationship is not quite clear yet. I’ll have more to say about this in my next post, and the formulation here is almost certainly not the “optimal” one, but since this is a blog and not a paper, let’s stick with this property for the time being so that we can formulate an actual result and not get lost in the weeds.

Proposition 5 Let ${f}$ satisfy Conditions 1 and 2, and Property ${(T_k)}$ for some ${k\in {\mathbb N}}$. Then ${{\mathcal G}^{(k)}}$ has specification.

Proof: By Property ${(T_k)}$, for every ${q,r\in {\mathcal L}_k}$, there are ${p^1,\dots, p^\ell \in {\mathcal L}_k}$ such that ${q \to p^1 \to p^2 \to \cdots \to p^\ell \to r}$. Thus ${qp^1, p^1p^2, \dots, p^{\ell-1} p^\ell, p^\ell r\in {\mathcal G}^{(k)}}$. Applying Lemma 3 repeatedly shows that ${qp^1\cdots p^\ell r \in {\mathcal G}^{(k)}}$. Write ${u = u(q,r) = p^1\cdots p^\ell}$, so ${qur \in {\mathcal G}^{(k)}}$.

Now let ${\tau = \max \{ |u(q,r)| : q,r\in {\mathcal L}_k \}}$. Given any ${vq,rw\in {\mathcal G}}$ with ${|q|=|r|=k}$, let ${u = u(q,r)}$, so that ${qur \in {\mathcal G}}$. One application of Lemma 3 gives ${vqur \in {\mathcal G}}$, and a second application gives ${vqurw\in {\mathcal G}}$. Since ${|u| \leq \tau}$, this completes the proof. $\Box$

4. Bounding the entropy of suffixes

Let ${f}$ be an interval map satisfying Conditions 1–3 and Property ${(T_k)}$. Proposition 5 shows that ${{\mathcal G}^{(k)}}$ has specification, so it remains to find ${{\mathcal C} \subset {\mathcal L}}$ such that the decomposition and entropy gap conditions are satisfied.

For simplicity, let us first discuss the case ${k=1}$. Given ${w\in {\mathcal L}}$, we want to write ${w = uv}$ where ${u\in {\mathcal G}}$ and ${v}$ lies in some yet-to-be-determined collection ${{\mathcal C} \subset {\mathcal L}}$. Since this collection should be “small”, it makes sense to ask for ${u}$ to be as long as possible, and thus it is reasonable to take ${u = w_1 \cdots w_m}$, where ${m}$ is the largest index in ${\{1,\dots, |w|\}}$ with the property that

$latex \displaystyle w_1\cdots w_m \in {\mathcal G}. \ \ \ \ \ (7)&fg=000000$

In particular, for every ${\ell \in \{m+1,\dots, |w|\}}$, we have

$latex \displaystyle w_1 \cdots w_\ell \notin {\mathcal G}. \ \ \ \ \ (8)&fg=000000$

By Lemma 2, this implies that ${w_m \cdots w_\ell \notin {\mathcal G}}$: indeed, if we had ${w_m \cdots w_\ell \in {\mathcal G}}$, then using (7) and Lemma 2 would give ${w_1 \cdots w_\ell \in {\mathcal G}}$, contradicting (8).

This shows that if we put ${a = w_m}$, then ${w_{m+1} \cdots w_{|w|}}$ lies in the collection

$latex \displaystyle {\mathcal C}(a) := \{ v\in {\mathcal L} : av\in {\mathcal L}, av_1\cdots v_\ell \notin {\mathcal G} \ \forall 1\leq \ell \leq |v| \}. \ \ \ \ \ (9)&fg=000000$

Taking ${{\mathcal C} = \bigcup_{a\in A} {\mathcal C}(a)}$ gives the desired decomposition. But what is the entropy of ${{\mathcal C}}$? First note that if

${vb \in {\mathcal C}_{n+1}(a)}$ for some ${v\in {\mathcal L}}$ and ${a,b \in A}$, then ${v\in {\mathcal C}_n(a)}$, so we can bound ${\#{\mathcal C}_{n+1}(a)}$ in terms of ${\#{\mathcal C}_n(a)}$ by controlling how many choices of ${b}$ there are for each ${v \in {\mathcal C}_n(a)}$.

The key is in the observation that ${F(av)}$ and ${I(b)}$ are intervals that intersect in more than one point (since ${avb \in {\mathcal L}}$), but that ${F(av) \not\supset I(b)}$ (since ${avb \notin {\mathcal G}}$). The only way for this to happen is

if the interior of ${I(b)}$ contains one of the endpoints of ${F(av)}$. Since the interiors of the intervals ${I(b)}$ are disjoint, we see that for any ${v\in {\mathcal C}_n(a)}$ there are at most two choices of ${b}$ for which ${vb \in {\mathcal C}_{n+1}(a)}$. It follows that

$\displaystyle \#{\mathcal C}_{n+1}(a) \leq 2 \#{\mathcal C}_n(a),$

and iterating this gives

$\displaystyle \#{\mathcal C}_n(a) \leq 2^n \quad\Rightarrow\quad \#{\mathcal C}_n \leq 2^n (\# A)$

We conclude that

$\displaystyle \limsup_{n\to\infty} \frac 1n \log \#{\mathcal C}_n \leq \log 2.$

If ${h_{\mathrm{top}}(f) > \log 2}$, then this provides the needed entropy gap, and we are done.

But what if ${h_{\mathrm{top}}(f)}$ lies in ${(0,\log 2]}$? This is where we need to use ${{\mathcal G}^{(k)}}$ instead of ${{\mathcal G}}$. Given any ${k\in {\mathbb N}}$ and ${q\in {\mathcal L}_k}$, let

$latex \displaystyle {\mathcal C}^{(k)}(q) := \{v \in {\mathcal L} : qv\in {\mathcal L}, q v_1 \cdots v_\ell \notin {\mathcal G}^{(k)} \ \forall 1\leq \ell \leq |v| \}. \ \ \ \ \ (10)&fg=000000$

Following the same idea as in the case ${k=1}$, we see that every word in ${{\mathcal C}^{(k)}_{n+k}(q)}$ is of the form ${vp}$, where ${v\in {\mathcal C}^{(k)}_n(q)}$ and ${|p| = k}$ are such that the intervals ${F(qv)}$ and ${I(p)}$ intersect in more than one point (since ${qvp \in {\mathcal L}}$) but ${F(qv) \not\supset I(p)}$ (since ${qvp \notin {\mathcal G}^{(k)}}$). The only way for this to happen is if the interior of ${I(p)}$ contains one of the endpoints of ${F(qv)}$. Since the interiors of the intervals ${I(p)}$ are disjoint (as ${p}$ ranges over ${{\mathcal L}_k}$), we see that for any ${v\in {\mathcal C}^{(k)}_n(q)}$ there are at most two choices of ${p\in {\mathcal L}_k}$ for which ${vp \in {\mathcal C}^{(k)}_{n+1}(q)}$. It follows that

$\displaystyle \#{\mathcal C}^{(k)}_{n+k}(q) \leq 2 \#{\mathcal C}^{(k)}_n(q),$

and iterating this gives

$\displaystyle \#{\mathcal C}^{(k)}_n(a) \leq 2^{\lfloor n/k\rfloor} (\#{\mathcal L}_k) \quad\Rightarrow\quad \#{\mathcal C}^{(k)}_n \leq 2^{n/k} (\#{\mathcal L}_k)^2$

We conclude that

$\displaystyle \limsup_{n\to\infty} \frac 1n \log \#{\mathcal C}^{(k)}_n \leq \frac 1k\log 2 < h_{\mathrm{top}}(f) \ \ \ \ \ (11)$

as long as we choose ${k > \log(2) / h_{\mathrm{top}}(f)}$.

Together with Proposition 5, this proves the following.

Theorem 6 Let ${f}$ be an interval map satisfying Conditions 1–3 and Property ${(T_k)}$ for some ${k > \log(2) / h_{\mathrm{top}}(f)}$. Then the coding space ${X}$ for ${f}$ has a language with a decomposition satisfying the specification and entropy gap conditions; in particular, ${f}$ has a unique MME.

Some natural questions present themselves at this point.

• How does Property ${(T_k)}$ relate to the topological transitivity condition from Hofbauer’s paper?
• Does having Property ${(T_k)}$ for some value of ${k}$ tell us anything about having it for other values? (If so, one might hope to be able to remove the condition on ${k}$ in Theorem 6.)
• Is there a more general version of Property ${(T_k)}$ that we can use in Proposition 5 and Theorem 6?

I’ll address these in the next post. For now I will conclude this post by emphasizing the idea in the proof of (11): “if we go ${k}$ iterates in each step, then since each step only has two ways to be bad, we can bound the entropy of always-bad things by ${\frac 1k \log 2}$”. This same idea plays a key role in the proof of the growth lemma for dispersing billiards (although in that setting one needs to replace “two ways” with “${k}$ ways” and the bound becomes ${\frac 1k \log k}$); see Chapter 5 of the book “Chaotic Billiards” by Chernov and Markarian, or Section 5 of “On the measure of maximal entropy for finite horizon Sinai Billard maps” by Baladi and Demers (JAMS, 2020) (in particular, compare equation (5.4) there with (11) above). The fact that taking ${k}$ large makes this estimate smaller than the entropy is an example of the phenomenon “growth beats fragmentation”, which is important in studying systems with singularities.

## An improved non-uniform specification result

My last post (nearly two years ago!) described Bowen’s proof that a shift space with the specification property has a unique measure of maximal entropy. A little over ten years ago, Dan Thompson and I gave a version of Bowen’s proof that turned out to be well-suited for applications to non-uniformly hyperbolic systems; an overview of the approach and of various extensions and applications since then, especially to equilibrium states for geodesic flows, can be found in our survey article in “Thermodynamic Formalism: CIRM Jean-Morlet Chair, Fall 2019” (Springer LNM, volume 2290). Today I want to return to the original setting of shift spaces, as described in our original paper (Israel J. Math, 2012), which I will refer to as [CT] from now on.

Our main result in [CT] contained an assumption that always bothered me (Condition (III) in Theorem C, see below for a description). I really had the feeling that the result should remain true even without it, but we couldn’t figure out how to remove it. Recently, Maria Jose Pacifico, Fan Yang, and Jiagang Yang have managed to remove this hypothesis in a new paper, which I will refer to as [PYY]. They consider flows rather than shift spaces, and it turns out that for flows with fixed points, including the Lorenz attractors that they are interested in, verifying the “extra assumption” (a version of which does indeed appear in my 2016 paper with Dan Thompson on non-uniform specification for flows) actually presents real problems, so their ability to prove a uniqueness result without requiring this condition has not just aesthetic but also practical value.

The arguments for flows are substantially more technical than those for shift spaces, and the proofs involve a steady stream of Bowen balls, adapted partitions, separated sets, two-scale partition sums, bounded distortion estimates, etc., which somewhat obscure the core innovation in [PYY] that brings the improvement over [CT]. The purpose of this post is to translate the argument from [PYY] back to the symbolic setting in order to clarify exactly where this improvement lies; the results I describe below can all be formally deduced from [PYY], but I will give complete proofs of everything that does not appear already in [CT].

Basic notation for shift spaces, etc., will be as in my previous post. To formulate the (improved) uniqueness result, we need the notion from [CT] of a decomposition of the language ${{\mathcal L}}$ of a shift space ${X}$: this is a choice of three collections of words ${{\mathcal C}^p, {\mathcal G}, {\mathcal C}^s \subset {\mathcal L}}$ such that every ${w\in {\mathcal L}}$ can be decomposed as a prefix’ from ${{\mathcal C}^p}$, followed by a good core’ from ${{\mathcal G}}$, followed by a suffix’ from ${{\mathcal C}^s}$. Formally, for every ${w\in {\mathcal L}}$, there are ${u^p \in {\mathcal C}^p}$, ${v\in {\mathcal G}}$, and ${u^s\in {\mathcal C}^s}$ such that ${w = u^p v u^s}$. See [CT] for examples of shift spaces with decompositions satisfying the conditions in the following theorem. (Spoiler: ${\beta}$-shifts and ${S}$-gap shifts.)

Theorem 1 Suppose ${X}$ is a shift space on a finite alphabet with topological entropy ${h>0}$ whose language ${{\mathcal L}}$ has a decomposition ${{\mathcal C}^p,{\mathcal G},{\mathcal C}^s}$ satisfying the following conditions.

(I) Specification for ${{\mathcal G}}$. There is ${\tau\in{\mathbb N}}$ such that for all ${w^1,w^2,\dots,w^n\in {\mathcal G}}$, there are ${u^1,\dots,u^{n-1}\in {\mathcal L}_\tau}$ such that ${w^1 u^1 w^2 u^2 \cdots u^{n-1} w^n\in{\mathcal L}}$.

(II) Entropy gap for ${{\mathcal C}^{p,s}}$. ${\limsup_{n\rightarrow\infty} \frac 1n \log \#({\mathcal C}^p \cup {\mathcal C}^s)_n < h}$.

Then ${X}$ has a unique MME.

In the proofs (both in [CT] and in [PYY]) one must work not just with ${{\mathcal G}}$ but also with the collections

$\displaystyle {\mathcal G}(M) = \{u^pvu^s : u^{p,s}\in {\mathcal C}^{p,s}, v\in {\mathcal G}, |u^{p,s}| \leq M \}.$

Observe that ${{\mathcal L} = \bigcup_M {\mathcal G}(M)}$; if you are familiar with Pesin theory for non-uniformly hyperbolic diffeomorphisms, it is reasonable to think of this as an analogue of the decomposition of a (non-uniformly) hyperbolic set into regular level sets. (If you are not familiar with Pesin theory, you should probably just ignore that last sentence.) In [CT], we required the following extra condition.

(III) For every ${M\in{\mathbb N}}$, there exists ${t}$ such that given ${w\in {\mathcal G}(M)}$, there exist words ${q,r\in {\mathcal L}}$ with length ${\leq t}$ such that ${qwr \in {\mathcal G}}$.

This condition guarantees that each of the collections ${{\mathcal G}(M)}$ has (a slightly weaker version of) specification as well, with gap size allowed to depend on ${M}$, and it appears in this form in all of our later papers; this is the condition that turns out to present problems for Lorenz attractors, and indeed for general flows with fixed points, and this is what motivated the authors of [PYY].

Let me mention in passing that there is also a symbolic setting where (III) becomes problematic; a few years ago I wrote a paper with Ronnie Pavlov (ETDS, 2019) where we studied shift spaces with “one-sided almost specification with bounded mistake function” and proved uniqueness of the MME. We were able to verify conditions (I) and (II) for these examples, but not condition (III) (see the end of §2.3 in that paper), and ultimately needed to use a slightly different set of conditions I had formulated in a paper constructing countable-state Markov models for shifts with non-uniform specification (Comm. Math. Phys., 2018). These conditions are a little more complicated, and the resulting proofs are a lot more complicated, so it is nice to know now that the uniqueness results in my paper with Ronnie Pavlov do in fact follow from the simpler and shorter proof of Theorem 1.

To see where [PYY] makes the improvement leading to a proof of Theorem 1 without requiring (III), start by recalling the overall structure of Bowen’s proof as I outlined it in my last post: first one uses specification to construct a Gibbs measure ${\mu}$, then one proves that ${\mu}$ is ergodic and that there can be no mutually singular MME ${\nu\perp \mu}$. The proof in [CT] follows similar lines, and roughly speaking, Condition (III) was only required for the second half. More precisely, using only Conditions (I) and (II), [CT] still contains a proof of the following.

Proposition 2 There exists an MME ${\mu}$ that has the following Gibbs-type property: there is ${c>0}$ such that

$\displaystyle \mu[w] \geq ce^{-|w| h} \text{ for all } w\in {\mathcal G}. \ \ \ \ \ (1)$

Moreover, there is ${m\in {\mathbb N}}$ such that

$\displaystyle \mu([u] \cap \sigma^{-(|u| + m)}[v]) \geq c e^{-(|u| + |v|) h} \text{ for all } u,v\in {\mathcal G}. \ \ \ \ \ (2)$

Proof: See Lemmas 5.1–5.10 in [CT] for the proofs of the uniform counting bounds and the construction of a measure satisfying (1); note the sentence after the proof of Lemma 5.10 emphasizing that Condition (III) was not used. The proof that this measure also satisfies the “two-step Gibbs bound” (2) is contained in the proof of [CT, Lemma 5.15], even though the statement of that lemma is a little different. (In fact one can take ${m}$ to be arbitrarily large, going to infinity along a sequence with bounded gaps, but we only need (2) for a single value of ${m}$.) $\Box$

The remainder of the proof in [CT] uses (2) to establish ergodicity and (1) to rule out the existence of a mutually singular MME. More precisely, this is done by using the fact under Condition (III), Proposition 2 still holds with ${{\mathcal G}}$ replaced by ${{\mathcal G}(M)}$ (although the constant ${c}$ is allowed to depend on ${M}$). One also uses the fact that various sets that appear in the arguments can be well-approximated by unions of cylinders corresponding to words in ${{\mathcal G}(M)}$.

Without Condition (III), there is no guarantee that Proposition 2 holds for ${{\mathcal G}(M)}$, and so instead [PYY] must work with the “good cores” of words, and must approximate the relevant sets with unions of cylinders from ${{\mathcal G}}$, not ${{\mathcal G}(M)}$. The two lemmas that make this precise both require a certain truncation operation’ to state properly: given ${w\in {\mathcal L}}$ and ${i,j\in{\mathbb N}}$ with ${i+j \leq |w|}$, let

$\displaystyle \varphi_{i,j}(w) = w_{(i,|w|-j)} = w_{i+1} w_{i+2} \cdots w_{|w| - j - 1}$

denote the word obtained by deleting the first ${i}$ and last ${j}$ symbols from ${w}$. Given a collection of words ${{\mathcal D}_n \subset {\mathcal L}_n}$, we will write ${[{\mathcal D}_n] = \bigcup_{w\in {\mathcal D}_n} [w] \subset X}$ for the union of the corresponding cylinders.

Lemma 3 There exist ${\alpha>0}$ and ${M\in {\mathbb N}}$ such that if ${\nu}$ is any MME and ${{\mathcal D}_n \subset {\mathcal L}_n}$ satisfies ${\nu([{\mathcal D}_n]) \geq \frac 12}$, then there are ${i,j \in \{0,1,\dots M\}}$ (depending on ${n}$ and ${{\mathcal D}}$) such that ${\#(\varphi_{i,j}({\mathcal D}_n) \cap {\mathcal G}) \geq \alpha e^{nh}}$.

(In fact one can replace ${\frac 12}$ with any ${\gamma \in (0,1)}$, at the cost of possibly needing to use a different ${\alpha}$ and ${M}$.)

Proof: This very nearly follows from two lemmas in [CT]:

• Lemma 5.7: For all ${\gamma \in (0,1)}$ there exists ${c_1 > 0}$ such that if ${\nu}$ is any MME and ${\nu({\mathcal D}_n) \geq \gamma}$, then ${\#{\mathcal D}_n \geq c_1 e^{nh}}$.
• Lemma 5.6: For every ${0, there exists ${M\in{\mathbb N}}$ such that if ${\#{\mathcal D}_n \geq c_1 e^{nh}}$, then ${\#({\mathcal D}_n \cap {\mathcal G}(M)) \geq c_2 e^{nh}}$. (The formulation in [CT] is slightly different, but the proof there gives this statement.)

Combining these gives ${M}$ such that

$\displaystyle \#({\mathcal D}_n \cap {\mathcal G}(M)) \geq c_2 e^{nh} \text{ whenever } \nu({\mathcal D}_n) \geq \tfrac12. \ \ \ \ \ (3)$

The remaining step is not done in [CT] but is done in [PYY, Lemma 5.2]. Observe that for every ${w\in {\mathcal D}_n \cap {\mathcal G}(M)_n}$ there are ${i,j\in \{0,1,\dots, M\}}$ such that ${\varphi_{i,j}(w) \in {\mathcal G}}$; moreover, for any ${v\in {\mathcal G}_{n-(i+j)}}$, there are at most ${(\#{\mathcal L}_i)(\#{\mathcal L}_j) \leq (\#{\mathcal L}_M)^2}$ words ${w\in {\mathcal D}_n \cap {\mathcal G}(M)}$ with ${\varphi_{i,j}(w) = v}$. It follows that

\displaystyle \begin{aligned} \#({\mathcal D}_n \cap {\mathcal G}(M)) &\leq \sum_{i,j=0}^M (\#{\mathcal L}_M)^2\#(\varphi_{i,j}({\mathcal D}_n) \cap {\mathcal G}) \\ &\leq (\#{\mathcal L}_M)^2(M+1)^2 \max_{i,j} \#(\varphi_{i,j}({\mathcal D}_n) \cap {\mathcal G}). \end{aligned}

Combining this with (3) gives

$\displaystyle (\#{\mathcal L}_M)^2(M+1)^2 \max_{i,j} \#(\varphi_{i,j}({\mathcal D}_n) \cap {\mathcal G}) \geq c_2 e^{nh},$

which proves Lemma 3 with ${\alpha = (\#{\mathcal L}_M)^{-2}(M+1)^{-2} c_2}$. $\Box$

From now on, we fix ${\alpha}$ and ${M}$ as given by Lemma 3. The key innovation for approximating sets using ${{\mathcal G}}$ (instead of ${{\mathcal G}(M)}$) is in [PYY, Lemma 7.3], which implies the following in our simple setting.

Lemma 4 Given any mutually disjoint invariant Borel probability measures ${\nu_1,\nu_2}$ on ${X}$, and any ${\beta>0}$, there is a collection of words ${{\mathcal U} \subset {\mathcal L}}$ and ${n_0\in {\mathbb N}}$ such that

1. ${\nu_1([{\mathcal U}_n]) \geq 1-\beta}$ for all ${n}$, and
2. ${\nu_2([\varphi_{i,j} {\mathcal U}_n]) \leq \beta}$ for all ${i,j,n}$ satisfying ${\lfloor n/2 \rfloor \geq i}$ and ${\lceil n/2 \rceil - j \geq n_0}$, in particular, for all ${n \geq 2(n_0 + \max(i,j))}$.

Proof: Since ${\nu_1 \perp \nu_2}$ and both are invariant, there are disjoint invariant sets ${P_1, P_2 \subset X}$ such that ${\nu_\ell(P_\ell) = 1}$ for ${\ell=1,2}$. By inner regularity there are compact sets ${K_\ell \subset P_\ell}$ such that ${\nu_\ell(K_\ell) \geq 1-\beta}$.

(If one only wanted the results to hold for ${i=j=0}$, then one could at this point put ${{\mathcal U} = \{w\in {\mathcal L} : [w] \cap K_1 \neq \emptyset\}}$ and be done; this is a fairly standard argument and is what we used in [CT], where in fact we did not even bother to spell it out so explicitly. The extension to general ${i,j}$ — which is needed in order to leverage Lemma 3, as we will see later — requires a little more care in the choice of ${{\mathcal U}}$, and this is where [PYY] is quite clever in strengthening the usual argument.)

Now given ${n\in{\mathbb N}}$, let ${{\mathcal U}_n = \{ w \in {\mathcal L}_n : [w] \cap \sigma^{-\lfloor n/2 \rfloor} K_1 \neq \emptyset \}}$. It follows immediately that ${[{\mathcal U}_n] \supset \sigma^{-\lfloor n/2 \rfloor}K_1}$, and since ${\nu_1}$ is invariant we get

$\displaystyle \nu_1[{\mathcal U}_n] \geq \nu_1(\sigma^{-\lfloor n/2 \rfloor} K_1) = \nu_1(K_1) \geq 1-\beta,$

which proves the first claim. For the second claim, we take ${n_0}$ large enough that no ${n_0}$-cylinder intersects both ${K_1}$ and ${K_2}$ (this is possible since these are compact disjoint sets). Then for every ${i,j,n}$ as in the statement, and every ${w\in {\mathcal U}_n}$, the word ${v := \varphi_{\lfloor n/2 \rfloor, j}(w)}$ has ${[v] \supset \sigma^{\lfloor n/2 \rfloor} [w]}$, which intersects ${K_1}$, so ${[v] \cap K_1 \neq \emptyset}$, and moreover ${|v| = (n-j) - \lfloor n/2 \rfloor \geq n_0}$, so ${[v] \cap K_2 = \emptyset}$; that is, ${[v] \subset K_2^c}$. We conclude that

$\displaystyle \sigma^{\lfloor n/2 \rfloor - i}[\varphi_{i,j}(w)] \subset [v] \subset K_2^c.$

Taking a union over all ${w\in {\mathcal U}_n}$ gives ${\sigma^{\lfloor n/2 \rfloor - i}[\varphi_{i,j}({\mathcal U}_n)] \subset X \setminus K_2}$, and now invariance of ${\nu_2}$ gives

$\displaystyle \nu_2([\varphi_{i,j}{\mathcal U}_n]) \leq \nu_2(\sigma^{-(\lfloor n/2 \rfloor - i)}(X\setminus K_2)) = \nu_2(X\setminus K_2) \leq \beta,$

which completes the proof of Lemma 4. $\Box$

With Lemmas 3 and 4 in hand, we can now follow [PYY] in completing the second part of the uniqueness argument by showing that there is no MME ${\nu\perp \mu}$, and that ${\mu}$ is ergodic. The first of these follows [PYY, Proposition 7.2]. Recall that ${c}$ is the constant in the lower Gibbs bounds (1) and (2), and ${\alpha,M}$ are given by Lemma 3.

Proposition 5 There is no MME ${\nu\perp \mu}$.

Proof: Suppose for a contradiction that there is another MME ${\nu\perp \mu}$. Fix ${\beta < \min(\frac 12, c\alpha)}$, and apply Lemma 4 with ${\nu_1 = \nu}$ and ${\nu_2 = \mu}$ to get ${{\mathcal U}\subset {\mathcal L}}$ and ${n_0\in{\mathbb N}}$ such that ${\nu([{\mathcal U}_n]) \geq 1-\beta}$ for all ${n}$, and

$\displaystyle \mu([\varphi_{i,j} {\mathcal U}_n]) \leq \beta \ \ \ \ \ (4)$

for all ${i,j,n}$ satisfying ${n \geq 2(n_0 + \max(i,j))}$.

Since ${\beta<\frac 12}$, we have ${\nu([{\mathcal U}_n]) > \frac 12}$ for all ${n}$, and since ${\nu}$ is an MME, Lemma 3 gives ${i,j\in \{0,1,\dots, M\}}$ (depending on ${n}$) with

$\displaystyle \#(\varphi_{i,j}({\mathcal U}_n) \cap {\mathcal G}) \geq \alpha e^{nh};$

thus the Gibbs property on ${{\mathcal G}}$ gives

$\displaystyle \mu([\varphi_{i,j} ({\mathcal U}_n) \cap {\mathcal G}]) \geq c e^{-(n-(i+j)) h} \#(\varphi_{i,j}({\mathcal U}_n) \cap {\mathcal G}) \geq c e^{(i+j) h} \alpha \geq c\alpha.$

For all ${n\geq 2(n_0 + M)}$ and this choice of ${i,j \in \{0,1,\dots, M\}}$, we also have (4), which leads to

$\displaystyle \beta \geq \mu([\varphi_{i,j} {\mathcal U}_n]) \geq \mu([\varphi_{i,j} ({\mathcal U}_n) \cap {\mathcal G}]) \geq c\alpha.$

This contradicts our choice of ${\beta}$ and completes the proof of Proposition 5. $\Box$

The proof of ergodicity follows [PYY, Proposition 7.4].

Proposition 6 ${\mu}$ is ergodic.

Proof: Suppose for a contradiction that ${P\subset X}$ is invariant and ${0 < \mu(P) < 1}$. Let ${\nu}$ and ${\nu'}$ be the normalizations of ${\mu}$ restricted to ${P}$ and ${P^c}$, respectively. Fix ${\beta < \min (\frac 12, \frac 12 c\alpha^2)}$, and apply Lemma 4 twice — once with ${\nu_1=\nu}$ and ${\nu_2=\nu'}$, and once with the roles reversed — to obtain ${{\mathcal U},{\mathcal U}' \subset {\mathcal L}}$ and ${n_0 \in {\mathbb N}}$ such that ${\nu([{\mathcal U}_n]) \geq 1-\beta}$ and ${\nu'([{\mathcal U}'_n]) \geq 1-\beta}$ for every ${n}$, and moreover

$\displaystyle \nu([\varphi_{i,j}{\mathcal U}_n']) \leq \beta \text{ and } \nu'([\varphi_{i,j}{\mathcal U}_n]) \leq \beta \ \ \ \ \ (5)$

for all ${i,j,n}$ satisfying ${n \geq 2(n_0 + \max(i,j))}$.

Observe that each of ${\nu}$ and ${\nu'}$ is an MME (since entropy depends affinely on the measure, and the MME ${\mu}$ is a convex combination of ${\nu}$ and ${\nu'}$). Thus we can apply Lemma 3 to both ${{\mathcal U}}$ and ${{\mathcal U}'}$, obtaining ${i,j,i',j' \in \{0,1,\dots, M\}}$ (depending on ${n}$) such that the collections

$\displaystyle {\mathcal V}(n) := \varphi_{i,j}({\mathcal U}_n) \cap {\mathcal G} \subset {\mathcal L}_{n-(i+j)} \text{ and } {\mathcal V}'(n) := \varphi_{i,j}({\mathcal U}_n')\cap {\mathcal G} \subset {\mathcal L}_{n-(i'+j')}$

both have cardinality at least ${\alpha e^{nh}}$.

Let ${k = n-(i+j) + m}$, so that ${|v| + m = k}$ for every ${v\in {\mathcal V}(n)}$. Given any such ${v}$, and any ${v' \in {\mathcal V}'(n)}$, the “two-step” Gibbs property in (2) gives

$\displaystyle \mu([v] \cap \sigma^{-k}[v']) \geq c e^{-(|v| + |v'|)h} \geq c e^{-2nh}.$

Summing over all such ${v}$ and ${v'}$ gives

$\displaystyle \mu([{\mathcal V}(n)] \cap \sigma^{-k} [{\mathcal V}'(n)]) \geq c e^{-2nh} (\#{\mathcal V}(n)) (\#{\mathcal V}'(n)) \geq c \alpha^2. \ \ \ \ \ (6)$

We will get an upper bound on this measure and derive a contradiction. Given any ${x\in [{\mathcal V}(n)] \cap \sigma^{-k}[{\mathcal V}'(n)]}$, we either have ${x\in P^c}$ or ${x\in P = \sigma^{-k} P}$ (since ${P}$ is invariant), and thus

$\displaystyle [{\mathcal V}(n)] \cap \sigma^{-k}[{\mathcal V}'(n)] \subset ([{\mathcal V}(n)] \cap P^c) \cup \sigma^{-k} ([{\mathcal V}'(n)] \cap P).$

Observe that whenever ${n\geq 2(n_0 + M)}$, (5) gives

$\displaystyle \mu([{\mathcal V}(n)] \cap P^c) = \nu'([{\mathcal V}(n)]) \mu(P^c) \leq \nu'([\varphi_{i,j}({\mathcal U}_n)]) \leq \beta$

and similarly

$\displaystyle \mu(\sigma^{-k}([{\mathcal V}'(n)] \cap P)) = \mu([{\mathcal V}'(n)] \cap P) \leq \beta,$

so that

$\displaystyle \mu([{\mathcal V}(n)] \cap \sigma^{-k}[{\mathcal V}'(n)]) \leq 2\beta.$

Comparing this with (6) gives ${2\beta \geq c\alpha^2}$, contradicting our choice of ${\beta}$ and completing the proof of Proposition 6. $\Box$

Posted in ergodic theory, theorems, Uncategorized | 1 Comment

## Specification and the measure of maximal entropy

These are notes for a talk I am giving in Jon Chaika’s online working seminar in ergodic theory. The purpose of the talk is to outline Bowen’s proof of uniqueness of the measure of maximal entropy for shift spaces with the specification property. Bowen’s approach extends much more broadly than this: to non-symbolic systems (assuming expansivity); to equilibrium states for non-zero potential functions (assuming a bounded distortion property); and to non-uniformly hyperbolic systems using the notion of “obstructions to specification and expansivity” developed by Dan Thompson and myself (see some notes here and here, and videos here). In these notes, though, I want to give the bare bones of the argument in the simplest possible setting, to make the essential structure as clear as possible. I gave an alternate argument in a previous post; here I am giving Bowen’s original argument, although I do not necessarily follow his presentation. I should also point out that this argument differs from the construction of the MME, or more generally the equilibrium state, in Bowen’s monograph, which uses the Ruelle operator.

1. Setting and result

Let ${A}$ be a finite set; the alphabet. Then the full shift ${A^{\mathbb N}}$ is the set of infinite sequences of symbols from ${A}$; this is a compact metric space with ${d(x,y) = e^{-n(x,y)}}$, where ${n(x,y) = \min \{ n : x_n \neq y_n\}}$. The shift map ${\sigma\colon A^{\mathbb N}\rightarrow A^{\mathbb N}}$ is defined by ${\sigma(x)_n = x_{n+1}}$. A shift space is a closed set ${X\subset A^{\mathbb N}}$ with ${\sigma(X) = X}$.

Example 1 The best example to keep in mind through this talk is a topological Markov shift: fix ${d\in{\mathbb N}}$ and put ${A = \{1,\dots, d\}}$; then fix a ${d\times d}$ transition matrix ${T}$ with entries in ${\{0,1\}}$, and write

$\displaystyle i\rightarrow j \text{ if } T_{ij} = 1, \quad i\not\rightarrow j \text{ if } T_{ij} = 0. \ \ \ \ \ (1)$

Define ${X = \{ x\in A^{\mathbb N} : x_n \rightarrow x_{n+1}}$ for all ${n\}}$. This is a topological Markov shift (TMS). It can be viewed in terms of a directed graph with vertex set ${A}$ and edges given by (1): ${X}$ consists of all sequences that label infinite paths on the graph.

The TMS is mixing or primitive if there is ${N\in{\mathbb N}}$ such that ${(T^N)_{ij} > 0}$ for all ${i,j}$. Equivalently, the graph is strongly connected and the set of loop lengths on the graph has gcd ${1}$.

Given a shift space ${X}$, consider

\displaystyle \begin{aligned} \mathcal{M} &= \{ \text{Borel probability measures on }X\}, \\ \mathcal{M}_\sigma &= \{ \mu \in \mathcal{M} : \sigma_* \mu := \mu\circ \sigma^{-1} = \mu \}, \\ \mathcal{M}_\sigma^e &= \{ \mu \in \mathcal{M}_\sigma : \mu \text{ is ergodic} \}. \end{aligned}

The set ${\mathcal{M}_\sigma}$ of invariant measures is extremely large for a mixing TMS (and more generally for systems with some sort of hyperbolic behavior), and it is important to identify “distinguished” invariant measures. One way of doing this is via the variational principle

$\displaystyle h_{\mathrm{top}}(X) = \sup \{ h(\mu) : \mu \in \mathcal{M}_\sigma \}.$

The next section recalls the definitions of the topological and measure-theoretic entropies in this setting. A measure achieving the supremum is a measure of maximal entropy (MME).

We will see that every mixing TMS has a unique MME, via a more general result. Given ${n\in{\mathbb N}_0}$ and ${w\in A^n}$, let ${[w] = wX \cap X}$ be the set of sequences in ${X}$ that start with the word ${w}$ (juxtaposition denotes concatenation); call this the cylinder of ${w}$. Define the language of ${X}$ by

$\displaystyle \mathcal{L}_n = \{ w\in A^n : [w] \neq \emptyset \}, \qquad \mathcal{L} = \bigcup_{n\in {\mathbb N}_0} \mathcal{L}_n.$

Definition 1 ${X}$ has specification if there is ${\tau\in \mathbb{N}_0}$ such that for all ${v,w\in \mathcal{L}}$ there is ${u\in \mathcal{L}_\tau}$ such that ${vuw\in \mathcal{L}}$.

Exercise 1 Prove that every mixing TMS has specification.

Theorem 2 (Bowen) If ${X}$ has specification, then it has a unique MME.

2. Entropies

2.1. Topological entropy

Every word in ${\mathcal{L}_{m+n}}$ is of the form ${vw}$ for some ${v\in \mathcal{L}_m}$ and ${w\in \mathcal{L}_n}$; thus

$\displaystyle \mathcal{L}_{m+n} \subset \mathcal{L}_m \mathcal{L}_n \quad\Rightarrow\quad \#\mathcal{L}_{m+n} \leq \#\mathcal{L}_m \#\mathcal{L}_n.$

This means that the sequence ${c_n := \log \#\mathcal{L}_n}$ has the sub-additivity property

$\displaystyle c_{m+n} \leq c_m + c_n. \ \ \ \ \ (2)$

Exercise 2 Prove Fekete’s lemma: for any sequence satisfying (2), ${\lim_{n\rightarrow\infty} \frac 1n c_n}$ exists and is equal to ${\inf_n \frac 1n c_n}$ (a priori it could be ${-\infty}$).

We conclude that the topological entropy ${h_{\mathrm{top}}(X) := \lim_{n\rightarrow\infty} \frac 1n \log \#\mathcal{L}_n}$ exists for every shift space. This quantifies the growth rate of the total complexity of the system.

Exercise 3 Prove that ${h_{\mathrm{top}}(X)}$ is the box dimension of ${X}$ (easy). Then prove that it is also the Hausdorff dimension of ${X}$ (harder). (Both of these facts rely very strongly on the fact that ${d(\sigma x,\sigma y)/d(x,y)}$ is globally constant whenever ${x,y}$ are close; for general systems where the amount of expansion may vary, the definition of topological entropy is more involved and the relationship to dimension is more subtle, although it is worth noting that a 1973 paper of Bowen in TAMS gives a definition of topological entropy that is analogous to Hausdorff dimension.)

2.2. Measure-theoretic entropy

Recall the motivation from information theory for the definition of entropy: given ${p\in (0,1]}$, let ${I(p) = -\log p}$. This can be interpreted as the information associated to an event with probability ${p}$; note that it is monotonic (the less likely an event is, the more information we gain by learning that it happened) and that ${I(pq) = I(p) + I(q)}$.

Now define ${\phi}$ on ${[0,1]}$ by ${\phi(p) = pI(p) = -p\log p}$ (and ${\phi(0) = 0}$); this can be interpreted as the expected amount of information associated to an event with probability ${p}$.

Exercise 4 Show that ${\phi}$ is concave.

Given ${N\in{\mathbb N}}$, let ${\Delta = \Delta_N = \{ \bar{p} = (p_1,\dots, p_N) : p_i \geq 0 }$ and ${\sum p_i \leq 1\}}$ be the set of sub-probability vectors with ${N}$ components. Define

$\displaystyle H(\bar{p}) = \sum_{i=1}^N \phi(p_i); \ \ \ \ \ (3)$

this can be interpreted as the expected information associated to a collection of mutually exclusive events with probabilities ${p_1,\dots, p_N}$.

Exercise 5 Show that ${H(\bar{p}) \leq \log N}$ for all ${\bar{p} \in \Delta_N}$, with equality if and only if ${p_i = \frac 1N}$ for all ${i}$.

Given ${\mu\in \mathcal{M}_\sigma}$, we have for each ${n}$ a probability vector with ${\#\mathcal{L}_n}$ components; writing ${\mu(w) = \mu([w])}$ for convenience, the entropy (expected information) associated to this vector is

$\displaystyle H_n(\mu) := \sum_{w\in \mathcal{L}_n} \phi(\mu(w)). \ \ \ \ \ (4)$

Lemma 3 ${H_{m+n}(\mu) \leq H_m(\mu) + H_n(\mu)}$

Proof:

\displaystyle \begin{aligned} H_{m+n}(\mu) &\leq \sum_{v\in \mathcal{L}_m} \sum_{w\in \mathcal{L}_n} \mu(vw) I \Big( \frac{\mu(vw)}{\mu(v)} \cdot \mu(v) \Big) \\ &= \sum_{w\in \mathcal{L}_n} \sum_{v\in \mathcal{L}_m} \mu(v) \phi \Big( \frac{\mu(vw)}{\mu(v)} \Big) + \sum_{v\in \mathcal{L}_m} \sum_{w\in \mathcal{L}_n} \mu(vw) I(\mu(v)) \\ &\leq \sum_{w\in \mathcal{L}_n} \phi \Big( \sum_{v\in \mathcal{L}_m} \mu(vw) \Big) + \sum_{v\in \mathcal{L}_m} \phi(\mu(v))\\ &= H_n(\mu) + H_m(\mu), \end{aligned}

where the first line is by definition, the second is since ${I(pq) = I(p) + I(q)}$, the third uses concavity of ${\phi}$, and the fourth uses invariance of ${\mu}$ to get ${\sum_{v\in \mathcal{L}_m} \mu(vw) = \mu(w)}$. $\Box$

This lemma has the following intuitive interpretation: the expected information from the first ${m+n}$ symbols is at most the expected information from the first ${m}$ symbols plus the expected information from the next ${n}$ symbols.

Now Fekete’s lemma implies that the following measure-theoretic entropy exists:

$\displaystyle h(\mu) := \lim_{n\rightarrow\infty} \frac 1n H_n(\mu). \ \ \ \ \ (5)$

2.3. Variational principle

Using Exercise 5 we see that ${H_n(\mu) \leq c_n = \log \#\mathop{\mathcal L}_n}$, with equality if and only if ${\mu(w) = 1/\#\mathop{\mathcal L}_n}$ for all ${w\in \mathop{\mathcal L}_n}$. This immediately proves that

$\displaystyle h(\mu) \leq h_{\mathrm{top}}(X) \text{ for all } \mu \in \mathcal{M}_\sigma, \ \ \ \ \ (6)$

and suggests that in order to have equality we should look for a measure with

$\displaystyle \mu(w) \approx \frac 1{\#\mathop{\mathcal L}_n} \approx e^{-n h_{\mathrm{top}}(X)} \text{ for all } w\in \mathcal{L}_n. \ \ \ \ \ (7)$

The following makes this more precise.

Definition 4 ${\mu\in \mathcal{M}_\sigma}$ is a Gibbs measure if there are ${h,c,C>0}$ such that for all ${n\in {\mathbb N}}$ and ${w\in \mathcal{L}_n}$, we have ${ce^{-nh} \leq \mu(w) \leq C e^{-nh}}$.

Now Theorem 2 is a consequence of the following two results, which we prove below.

Theorem 5 If ${\mu}$ is an ergodic Gibbs measure for ${X}$, then ${h=h_{\mathrm{top}}(X) = h(\mu)}$ and ${\mu}$ is the unique MME.

Theorem 6 If ${X}$ has specification, then it has an ergodic Gibbs measure.

In fact the construction of ${\mu}$ below always gives equality in (6), without relying on the specification property (or obtaining uniqueness), but we will not prove this.

2.4. Convex combinations

Before embarking on the proof of Theorems 5 and 6, we establish a general property of entropy that will be important.

Lemma 7 Given ${\bar{p},\bar{q} \in \Delta_N}$ and ${s,t \in [0,1]}$ with ${s+t = 1}$, we have

$\displaystyle sH(\bar{p}) + tH(\bar{q}) \leq H(s\bar{p} + t\bar{q}) \leq sH(\bar{p}) + tH(\bar{q}) + \log 2. \ \ \ \ \ (8)$

Proof: The first inequality follows immediately from concavity of ${\phi}$. For the second inequality we first observe that

$\displaystyle H(s\bar{p}) = \sum_{i=1}^N sp_i I(sp_i) = \sum_{i=1}^N \big(sp_i I(p_i) + sp_i I(s)\big) = sH(\bar{p}) + \phi(s) \sum_{i=1}^N p_i. \ \ \ \ \ (9)$

Then we use monotonicity of ${I}$ to get

\displaystyle \begin{aligned} H(s\bar{p} + t\bar{q}) &= \sum s p_i I(sp_i + tq_i) + tq_i I(sp_i + tq_i) \\ &\leq \sum s p_i I(sp_i) + \sum tq_i I(tq_i) \\ & = sH(\bar{p}) + tH(\bar{q}) + \phi(s) \sum p_i + \phi(t) \sum q_i, \end{aligned}

where the last equality uses (9) twice; this proves (8). $\Box$

Applying Lemma 7 to the probability vectors associated to two measures ${\nu,\mu \in \mathcal{M}_\sigma}$, we see that

$\displaystyle s H_n(\nu) + t H_n(\mu) \leq H_n(s\nu + t\mu) \leq s H_n(\nu) + t H_n(\mu) + \log 2$

for all ${n}$: dividing by ${n}$ and sending ${n\rightarrow\infty}$ gives

$\displaystyle h(s\nu + t\mu) = s h(\nu) + t h(\mu). \ \ \ \ \ (10)$

Remark 1 It is worth mentioning the deeper fact that a version of (10) holds for infinite convex combinations (even uncountable ones given by an integral); this is due to Konrad Jacobs, see Section 9.6 of “Foundations of Ergodic Theory” by Viana and Oliveira.

3. Gibbs implies uniqueness

To prove Theorem 5, start by observing that the lower Gibbs bound gives

$\displaystyle 1 = \mu(X) = \sum_{w\in \mathcal{L}_n} \mu(w) \geq ce^{-nh} \#\mathcal{L}_n \quad\Rightarrow\quad \#\mathcal{L}_n \leq c^{-1} e^{nh}, \ \ \ \ \ (11)$

and thus ${h_{\mathrm{top}}(X) \leq h}$. Meanwhile, the upper Gibbs bound gives

$\displaystyle H_n(\mu) = \sum_{w\in \mathcal{L}_n} \mu(w) I(\mu(w)) \geq \sum_{w\in \mathcal{L}_n} \mu(w) I(C e^{-nh}) = I(C e^{-nh}) = nh - \log C,$

and thus ${h(\mu) \geq h}$, so we conclude that ${h_{\mathrm{top}}(X) = h = h(\mu)}$. It remains to show that every ${\nu \in \mathcal{M}_\sigma}$ with ${\nu \neq \mu}$ has ${h(\nu) < h}$.

First observe that by (10), we can restrict our attention to the case when ${\nu\perp \mu}$. Indeed, given any ${\nu \in \mathcal{M}_\sigma}$, the Lebesgue decomposition theorem gives ${\nu = s\nu_1 + t\nu_2}$ for some ${\nu_1 \perp \mu}$ and ${\nu_2 \ll \mu}$, and (10) gives ${h(\nu) = sh(\nu_1) + th(\nu_2)}$. By ergodicity we must have ${\nu_2 = \mu}$ and thus ${s>0}$, so if ${h(\nu_1) < h}$ then the same is true of ${h(\nu)}$.

Now consider ${\nu \perp \mu}$. Then there is a Borel set ${D\subset X}$ with ${\nu(D) = 0}$ and ${\mu(D) = 1}$, and this in turn gives ${\mathcal{D}_n \subset \mathcal{L}_n}$ such that

$\displaystyle \nu(\mathcal{D}_n) \rightarrow 0 \text{ and } \mu(\mathcal{D}_n) \rightarrow 1 \text{ as } n\rightarrow\infty. \ \ \ \ \ (12)$

Let ${\nu|_{\mathcal{D}_n}}$ denote the normalization of ${\nu}$ after restricting to words in ${\mathcal{D}_n}$, and similarly for ${\nu|_{\mathcal{D}_n^c}}$. Recall from Fekete’s lemma and subadditivity of ${H_n(\nu)}$ that ${\frac 1n H_n(\nu) \geq h(\nu)}$ for all ${n}$. Then we get

\displaystyle \begin{aligned} nh(\nu) &\leq H_n(\nu) = H_n \big(\nu(\mathcal{D}_n) \nu|_{\mathcal{D}_n} + \nu(\mathcal{D}_n^c) \nu|_{\mathcal{D}_n^c} \big) \\ &\leq \nu(\mathcal{D}_n) H_n(\nu|_{\mathcal{D}_n}) + \nu(\mathcal{D}_n^c) H_n(\nu|_{\mathcal{D}_n^c}) + \log 2 \\ &\leq \nu(\mathcal{D}_n) \log \#\mathcal{D}_n + \nu(\mathcal{D}_n^c) \log \#\mathcal{D}_n^c + \log 2 \\ &\leq \nu(\mathcal{D}_n) \log(c^{-1} e^{nh} \mu(\mathcal{D}_n)) + \nu(\mathcal{D}_n^c) \log(c^{-1} e^{nh} \mathcal{D}_n^c) + \log 2 \\ &= nh - \log c + \log 2 + \nu(\mathcal{D}_n) \log \mu(\mathcal{D}_n) + \nu(\mathcal{D}_n^c) \log \mu(\mathcal{D}_n^c), \end{aligned}

where the second line uses Lemma 7, the third line uses Exercise 5, and the fourth line uses the lower Gibbs bound as in (11). We conclude that

$\displaystyle n(h(\nu) - h) \leq \log (2/c) + \nu(\mathcal{D}_n) \log \mu(\mathcal{D}_n) + \nu(\mathcal{D}_n^c) \log \mu(\mathcal{D}_n^c),$

and as ${n\rightarrow\infty}$ the right-hand side goes to ${-\infty}$ by (12), which implies that ${h(\nu) < h}$, completing the proof.

4. Specification implies Gibbs

Now we outline the proof of Theorem 6. This comes in three steps: (1) uniform counting bounds; (2) construction of a Gibbs measure; (3) proof of ergodicity.

4.1. Uniform counting bounds

From now on we write ${h = h_{\mathrm{top}}(X)}$ for convenience. Fekete’s lemma gives ${\frac 1n \log \#\mathcal{L}_n \geq h}$ for all ${n}$, or equivalently ${\#\mathcal{L}_n \geq e^{nh}}$. This can also be deduced by writing

$\displaystyle \#\mathcal{L}_{kn} \leq (\#\mathcal{L}_n)^k \quad\Rightarrow\quad \frac 1{kn} \log \#\mathcal{L}_{kn} \leq \frac 1n \log\# \mathcal{L}_n$

and sending ${k\rightarrow\infty}$ so that the left-hand side goes to ${h}$ (this is basically part of the proof of Fekete’s lemma).

To get an upper bound on ${\#\mathcal{L}_n}$ we need the specification property, which gives a 1-1 map ${\mathcal{L}_m\times \mathcal{L}_n \rightarrow \mathcal{L}_{m+n+\tau}}$, so that ${\#\mathcal{L}_{m+n+\tau} \geq \#\mathcal{L}_m \#\mathcal{L}_n}$. Then one can either apply Fekete’s lemma to ${b_n := -\log \# \mathcal{L}_{n+\tau}}$, or observe that

$\displaystyle \#\mathcal{L}_{k(n+\tau)} \geq (\#\mathcal{L}_n)^k \quad\Rightarrow\quad \frac 1{k(n+\tau)} \log \#\mathcal{L}_{k(n+\tau)} \geq \frac 1{n+\tau} \log\# \mathcal{L}_n.$

Sending ${k\rightarrow\infty}$ the left-hand side goes to ${h}$, and so combined with the lower bound above we get the uniform counting bounds

$\displaystyle e^{nh} \leq \#\mathcal{L}_n \leq e^{\tau h} e^{nh}. \ \ \ \ \ (13)$

4.2. A Gibbs measure

There is a standard procedure for constructing an MME: let ${\nu_n \in \mathcal{M}}$ be any sequence of (not necessarily invariant) Borel probability measures such that ${\nu_n(w) = 1/\#\mathcal{L}_n}$ for all ${w\in \mathcal{L}_n}$, and then put

$\displaystyle \mu_n = \frac 1n \sum_{k=0}^{n-1} \sigma_*^k \nu_n = \frac 1n \sum_{k=0}^{n-1} \nu_n \circ \sigma^{-k}.$

Since ${\mathcal{M}}$ is weak* compact, there is a weak* convergent subsequence ${\mu_{n_j}}$.

Exercise 6 Show that ${\mu := \lim_{j\rightarrow\infty} \mu_{n_j}}$ is ${\sigma}$-invariant (${\mu\circ \sigma^{-1} = \mu}$).

The preceding exercise is basically the proof of the Krylov-Bogolyubov theorem, and does not require any properties of the measures ${\nu_n}$ beyond the fact that they are Borel probability measures.

One can prove that ${\mu}$ is an MME whether or not ${X}$ has specification, but we will use specification to directly prove the stronger Gibbs property.

Given any ${w\in \mathcal{L}_m}$ and any ${\tau \leq k \leq n-\tau}$, we bound ${\nu_n(\sigma^{-k}[w])}$ by estimating how many words in ${\#\mathcal{L}_n}$ are of the form ${uwv}$ for some ${u \in \mathcal{L}_k}$ and ${v\in \mathcal{L}_{n-m-k}}$. Arguments similar to those in the uniform counting bounds show that

$\displaystyle \#\mathcal{L}_{k-\tau} \#\mathcal{L}_{n-k-m-\tau} \leq (\text{\# of such words}) \leq \#\mathcal{L}_k \#\mathcal{L}_{n-k-m},$

where the first inequality requires specification. Dividing by ${\#\mathcal{L}_n}$ and using the uniform counting bounds (13) gives

$\displaystyle \nu_n(\sigma^{-k}[w]) \leq \frac{\#\mathcal{L}_k \#\mathcal{L}_{n-k-m}}{\#\mathcal{L}_n} \leq \frac{e^{\tau h} e^{k h} e^{\tau h} e^{(n-k-m)h}}{e^{nh}} = e^{2\tau h} e^{-mh};$

using a similar estimate from below we get

$\displaystyle e^{-3\tau h} e^{-mh} \leq \nu_n(\sigma^{-k}[w]) \leq e^{2\tau h} e^{-mh}. \ \ \ \ \ (14)$

Averaging over all ${k}$ and sending ${n\rightarrow\infty}$ along the subsequence ${n_j}$ gives

$\displaystyle e^{-3\tau h} e^{-mh} \leq \mu(w) \leq e^{2\tau h} e^{-mh}, \ \ \ \ \ (15)$

so ${\mu}$ is a Gibbs measure.

4.3. Ergodicity

To prove that ${\mu}$ is ergodic, start by fixing ${v,w\in \mathcal{L}}$, with lengths ${|v|}$ and ${|w|}$, respectively. Given ${j\gg |v|}$ and ${k\ll n}$, follow the same procedure as above to estimate the number of words in ${\mathcal{L}_n}$ with the form ${xvywz}$, where ${x\in \mathcal{L}_k}$, ${y\in \mathcal{L}_{j - |v|}}$, and ${z\in \mathcal{L}_{n - k - j - |w|}}$, and obtain the bounds

$\displaystyle \frac{\#\mathcal{L}_{k-\tau} \#\mathcal{L}_{j-|v|-\tau} \#\mathcal{L}_{n - k - j - |w| - \tau}}{\#\mathcal{L}_n} \leq \sigma_*^k \nu_n([v] \cap \sigma^{-j} [w]) \leq \frac{\#\mathcal{L}_k \#\mathcal{L}_{j-|v|} \#\mathcal{L}_{n - k - j - |w|}}{ \#\mathcal{L}_n}.$

Averaging over ${k}$, sending ${n\rightarrow\infty}$, and using the uniform counting bounds (13) gives

$\displaystyle e^{-4\tau h} e^{-|v|h} e^{-|w|h} \leq \mu([v] \cap \sigma^{-j}[w]) \leq e^{3\tau h} e^{-|v|h} e^{-|w|h}.$

Using the Gibbs bounds (15) gives

$\displaystyle e^{-8\tau h} \mu(v) \mu(w) \leq \mu([v] \cap \sigma^{-j}[w]) \leq e^{9\tau h} \mu(v) \mu(w). \ \ \ \ \ (16)$

Exercise 7 Given Borel sets ${V,W \subset X}$, approximate ${V}$ and ${W}$ with cylinders and use (16) to get

$\displaystyle e^{-8\tau h} \mu(V) \mu(W) \leq \varliminf_{j\rightarrow\infty} \mu(V \cap \sigma^{-j}W) \leq \varlimsup_{j\rightarrow\infty} \mu(V \cap \sigma^{-j}W) \leq e^{9\tau h} \mu(V) \mu(W).$

Now if ${E \subset X}$ is invariant, then taking ${V=E}$ and ${W = E^c}$ in Exercise 7, the lower bound gives ${e^{-8\tau h} \mu(E)(1-\mu(E)) =0}$, so ${\mu(E)}$ is either ${0}$ or ${1}$. This proves ergodicity and completes the proof of Theorem 6.

Remark 2 In fact, the upper bound in Exercise 7 can be used to show that ${\mu}$ is mixing; see Proposition 20.3.6 in Katok and Hasselblatt.

Posted in ergodic theory, theorems | 1 Comment

## Cohomologous functions and the Livsic theorem

Given a dynamical system ${f\colon X\rightarrow X}$ (typically ${X}$ is a compact metric space and ${f}$ is continuous), we commonly encounter real-valued functions ${\varphi\colon X\rightarrow {\mathbb R}}$ in one of the following two roles.

1. An observable function represents a measurement of the system, so that the sequence of functions ${\{\varphi\circ f^n\}}$ represents measurements made at different times, about which we want to make predictions. In the cases we are interested in, these predictions will be probabilistic and will be in terms of the Birkhoff averages ${\frac 1n S_n\varphi(x)}$, where ${S_n\varphi(x) = \sum_{k=0}^{n-1} \varphi(f^k x)}$ is the ${n}$th Birkhoff sum.
2. A potential function is used to assign weights to different trajectories of the system for purposes of selecting an invariant measure with specific dynamical or geometric properties. Generally the orbit segment ${x, f(x), \dots, f^{n-1}(x)}$ is assigned a weight given by ${e^{S_n\varphi(x)}}$. A potential function also assigns weights to invariant measures by integration; in particular, we can study equilibrium measures for ${(X,f,\varphi)}$, which are invariant probability measures maximizing ${h_\mu(f) + \int\varphi\,d\mu}$. (Here ${h_\mu}$ is Kolmogorov–Sinai entropy.)

Before moving on we remark that ${\varphi\colon X\rightarrow[0,\infty)}$ can also play the role of a density function, especially if we are looking for an invariant measure that is absolutely continuous with respect to some reference measure, but for today we will focus on the two roles described above.

Let ${\mathcal{M}_f(X)}$ denote the set of ${f}$-invariant Borel probability measures on ${X}$. Then each ${\varphi \in C(X)}$ induces a map ${\hat\varphi\colon \mathcal{M}_f(X) \rightarrow {\mathbb R}}$ by ${\hat\varphi(\mu) = \int\varphi\,d\mu}$ as in the second item above. It is natural to ask when two functions ${\varphi,\psi \in C(X)}$ have ${\hat\varphi=\hat\psi}$. One immediate observation to make is that the defintion of ${f}$-invariance implies that ${\int\varphi \,d\mu = \int\varphi\circ f \,d\mu}$ for all ${\varphi\in C(X)}$ and ${\mu\in \mathcal{M}_f(X)}$, so that in particular we have ${\hat\varphi = \widehat{\varphi\circ f}}$. Thus the function ${\psi = \varphi - \varphi\circ f}$ has ${\hat\psi = \hat \varphi - \widehat{\varphi\circ f} = 0}$. We call such a function ${\psi}$ a coboundary; we have just shown that

Proposition 1 if ${\psi}$ is a coboundary, then it integrates to ${0}$ with respect to any invariant measure.

Moreover, since integration is linear in ${\varphi}$, we have

Proposition 2 if two functions ${\varphi}$ and ${\psi}$ differ by a coboundary, then ${\hat\varphi=\hat\psi}$, ie., ${\int\varphi\,d\mu = \int\psi\,d\mu}$ for every ${\mu\in \mathcal{M}_f(X)}$.

If ${\varphi}$ and ${\psi}$ differ by a coboundary, we say that they are cohomologous; in this case we can write ${\varphi = \psi + h - h\circ f}$ for some ${h\in C(X)}$, which we call the transfer function. Note that ${\varphi}$ is a coboundary if and only if it is cohomologous to the zero function.

Remark 1 For a discussion of how this is connected to the notion of cohomology in algebraic topology, see Terry Tao’s blog post from December 2008.

It is natural to ask whether cohomology is the only mechanism by which two functions ${\varphi,\psi\in C(X)}$ can have ${\hat\varphi=\hat\psi}$; in other words, do the converses of Propositions 1 and 2 hold?

In general the answer is no, as one can quickly see by letting ${f}$ be an irrational circle rotation, so that the only invariant measure is Lebesgue, and there are many continuous functions on the circle that have Lebesgue integral ${0}$ but are not coboundaries. When ${f}$ has hyperbolic behavior, however, the story is quite different, and this is what we will discuss in this post.

If ${f}$ is a diffeomorphism and ${X}$ is a topologically transitive locally maximal hyperbolic set, then ${f\colon X\rightarrow X}$ satisfies the following result.

Theorem 3 (Closing Lemma) For every ${\epsilon>0}$ there exists ${\delta>0}$ such that if ${x\in X}$ and ${n\geq 0}$ are such that ${d(f^n(x),x) < \delta}$, then there exists ${y\in X}$ such that ${f^n(y) = y}$ and ${d(f^k(y), f^k(x)) < \epsilon}$ for all ${0\leq k < n}$.

Moreover, in this case every Hölder continuous ${\varphi\colon X\rightarrow {\mathbb R}}$ has the following property, introduced by Walters in 1978.

Theorem 4 (Walters Property) For every ${\zeta>0}$ there exist ${\epsilon>0}$ such that if ${x,y\in X}$ and ${n\geq 0}$ are such that ${d(f^k(x),f^k(y)) < \epsilon}$ for all ${0\leq k < n}$, then ${|S_n\varphi(x) - S_n\varphi(y)| < \zeta}$.

Both the Closing Lemma and the Walters Property also hold when ${X}$ is a subshift of finite type and ${\varphi}$ is Hölder continuous.

Using these properties we can formulate and prove an important result.

Theorem 5 (Livsic Theorem) Let ${X}$ be a compact metric space, ${f\colon X\rightarrow X}$ a continuous map satisfying the Closing Lemma and possessing a point whose orbit is dense, and ${\varphi\colon X\rightarrow{\mathbb R}}$ a continuous function satisfying the Walters Property. Then ${\varphi}$ is a coboundary if and only if for every periodic point ${x = f^p(x) \in X}$, we have ${S_p \varphi(x) = 0}$.

The forward implication is immediate: if ${\varphi}$ is a coboundary, then ${\int\varphi\,d\mu = 0}$ for every invariant ${\mu}$, in particular for ${\mu = \frac 1p \sum_{k=0}^{p-1} \delta_{f^k x}}$. Equivalently one can observe that Birkhoff sums of coboundaries have the following behavior: if ${\varphi(x) = h(x) - h(f x)}$, then

$\displaystyle S_n\varphi(x) = \sum_{k=0}^{n-1} \big(h(f^k x) - h(f^{k+1} x)\big) = h(x) - h(f^n x), \ \ \ \ \ (1)$

and consequently ${S_p \varphi(x) = h(x) - h(f^p x) = 0}$ whenever ${x=f^p x}$. We must prove the reverse implication, that if ${S_p\varphi(x)=0}$ whenever ${x=f^p x}$, then ${\varphi}$ is a coboundary. To this end note that if ${h}$ is a continuous (hence uniformly continuous) transfer function for ${\varphi}$, then (1) immediately determines ${h}$ along the entire forward orbit of a point ${y}$ by

$\displaystyle h(f^n y) = h(y) + S_n\varphi(y). \ \ \ \ \ (2)$

(A similar procedure defines ${h}$ along the backward orbit if ${f}$ is invertible.) If ${y}$ is a point whose orbit is dense in ${X}$, then this determines ${h}$ on all of ${X}$ by continuity. Thus it suffices to prove that (2) defines a uniformly continuous function on the orbit of ${y}$. To this end, given ${\zeta>0}$, let ${\epsilon = \epsilon(\zeta)>0}$ be given by the Walters Property, and then let ${\delta = \delta(\epsilon)>0}$ be given by the Closing Lemma. Suppose that ${w,z}$ are two points on the orbit of ${y}$ such that ${d(w,z) < \delta}$. Since ${w,z}$ lie on the same orbit, there is ${n\in {\mathbb N}}$ such that ${f^n(z)=w}$ or ${f^n(w)=z}$. Without loss of generality we assume the first case, so that ${d(z,f^n z) = d(z,w) < \delta}$. By the Closing Lemma there is a periodic point ${y=f^n y}$ such that ${d(f^k y, f^k z) < \epsilon}$ for all ${0\leq k < n}$. By (2) and the Walters Property, this implies that

$\displaystyle |h(w) - h(z)| = |h(f^n z) - h(z)| = |S_n\varphi(z)| \leq |S_n\varphi(x)| + \zeta = \zeta,$

where the last equality uses the fact that Birkhoff sums of ${\varphi}$ vanish around periodic orbits. This proves uniform continuity of ${h}$ on the orbit of ${y}$, so ${h}$ extends uniformly continuously to ${X}$, and it is an easy exercise using continuity of both sides of the equation ${\varphi = h - h\circ f}$ to verify that this relationship continues to hold on all of ${X}$, so that ${\varphi}$ is indeed a coboundary. This completes the proof of the Livsic Theorem.

Let ${X,f}$ be as in the Livsic Theorem; then for any ${\varphi,\psi}$ satisfying the Walters Property, the following are equivalent:

1. ${\varphi = \psi + h - h\circ f}$ for some ${h\in C(X)}$;
2. ${\int\varphi\,d\mu = \int\psi\,d\mu}$ for all ${\mu\in\mathcal{M}_f(X)}$;
3. ${\int\varphi\,d\mu = \int\psi\,d\mu}$ for all periodic orbit measures.

Here is one final remark on a specific example of cohomologous potentials: one immediately sees that ${\int \frac 1nS_n\varphi \,d\mu = \int \varphi\,d\mu}$ for every ${\mu\in\mathcal{M}_f(X)}$, and so under the conditions of the Livsic Theorem one can conclude that each Birkhoff average ${\frac 1nS_n\varphi}$ is cohomologous to ${\varphi}$. In fact this can be shown directly, without using the Livsic Theorem, by putting

\displaystyle \begin{aligned} h(x) &= \frac 1n\big((n-1) \varphi(x) + (n-2)\varphi(f x) + (n-3)\varphi(f^2 x) + \cdots + \varphi(f^{n-2} x)\big) \\ &= \frac 1n\sum_{k=0}^{n-1} (n-1-k) \varphi(f^{k-1} x). \end{aligned}

Then we have

\displaystyle \begin{aligned} h(x) - h(fx) &= \frac 1n\big((n-1) \varphi(x) + (n-2)\varphi(f x) + \cdots + \varphi(f^{n-2} x)\big) \\ &\quad\quad - \frac 1n\big((n-1) \varphi(f x) + (n-2)\varphi(f^2 x) + \cdots + \varphi(f^{n-1} x)\big) \\ &= \frac{n-1}n \varphi(x) - \frac 1n\big( \varphi(fx) + \varphi(f^2x) + \cdots + \varphi(f^{n-1} x) \big) \\ &= \varphi(x) - \frac 1n S_n\varphi(x), \end{aligned}

which demonstrates that ${\varphi}$ and ${\frac 1nS_n\varphi}$ are cohomologous.

## Gibbs measures have local product structure

Let ${M}$ be a compact smooth manifold and ${f\colon M\rightarrow M}$ a transitive Anosov ${C^{1+\alpha}}$ diffeomorphism. If ${\mu}$ is an ${f}$-invariant Borel probability measure on ${M}$ that is absolutely continuous with respect to volume, then the Hopf argument can be used to show that ${\mu}$ is ergodic. In fact, recently it has been shown that even stronger ergodic properties such as multiple mixing can be deduced; for example, see Coudène, Hasselblatt, and Troubetzkoy [Stoch. Dyn. 16 (2016), no. 2].

A more general class of measures with strong ergodic properties is given by the theory of thermodynamic formalism developed in the 1970s: given any Hölder continuous potential ${\varphi\colon M\rightarrow{\mathbb R}}$, the quantity ${h_\mu(f) + \int\varphi\,d\mu}$ is maximized by a unique invariant Borel probability measure ${\mu = \mu_\varphi}$, which is called the equilibrium state of ${\varphi}$; the maximum value of the given quantity is the topological pressure ${P = P(\varphi)}$. The unique equilibrium state has the Gibbs property: for every ${\varepsilon>0}$ there is a constant ${K>0}$ such that

$\displaystyle K^{-1} \leq \frac{\mu(B_n(x,\varepsilon))}{e^{S_n\varphi(x) - nP}} \leq K, \ \ \ \ \ (1)$

where ${B_n(x,\varepsilon) = \{y\in M : d(f^kx,f^ky) < \varepsilon\ \forall 0\leq k < n\}}$ is the Bowen ball around ${x}$ of order ${n}$ and radius ${\varepsilon}$, and we write ${S_n\varphi(x) = \sum_{k=0}^{n-1} \varphi(f^kx)}$ for the ${n}$th ergodic sum along the orbit of ${x}$.

Historically, strong ergodic properties (mixing, K, Bernoulli) for equilibrium states have been established using methods such as Markov partitions rather than via the Hopf argument. However, in the more general non-uniformly hyperbolic setting, it can be difficult to extend these symbolic arguments, and so it is interesting to ask whether the Hopf argument can be applied instead, even if it only recovers some of the strong ergodic properties. The key property of absolutely continuous measures that is needed for the Hopf argument is the fact that they have a local product structure, which we define below. It was shown by Haydn [Random Comput. Dynam. 2 (1994), no. 1, 79–96] and by Leplaideur [Trans. Amer. Math. Soc. 352 (2000), no. 4, 1889–1912] that in the uniformly hyperbolic setting, the equilibrium states ${\mu_\varphi}$ have local product structure when ${\varphi}$ is Hölder continuous; thus one could apply the Hopf argument to them.

This post contains a direct proof that any measure ${\mu}$ with the Gibbs property (1) has local product structure; see Theorem 3 below. (This will be a bit of a longer post, since we need to recall several different concepts and then do some non-trivial technical work.) Since Bowen’s proof of uniqueness of equilibrium states using specification [Math. Systems Theory 8 (1974/1975), no. 3, 193–202] establishes the Gibbs property, this means that the equilibrium states produced this way could be addressed with the Hopf argument (I haven’t carried out the details yet, so I claim no formal results here). I should point out, though, that even without the use of Markov partitions, Ledrappier showed that these measures have the K property, which in particular implies multiple mixing. Since multiple mixing is the strongest thing we might hope to get from the Hopf argument, my primary motivation for the present approach is that Dan Thompson and I recently generalized Bowen’s result to systems satisfying a certain non-uniform specification property [Adv. Math. 303 (2016), 745–799], and the unique equilibrium states we obtain satisfy a non-uniform version of the Gibbs property (1), so it is reasonable to hope that they also have local product structure and can be studied using the Hopf argument; but this is beyond the scope of this post and will be addressed in a later paper.

1. Local product structure

Before defining local product structure for ${\mu}$, we recall some definitions. Since ${f}$ is Anosov, every ${x\in M}$ has local stable and unstable manifolds ${W^{s,u}_x}$, which have the following properties.

1. There are ${C \geq 1}$ and ${\lambda < 1 }$ such that for all ${x\in M}$, ${y\in W^s_x}$, and ${n\geq 0}$, we have ${d(f^n x, f^n y) \leq C\lambda^n d(x,y)}$; a similar contraction bound holds going backwards in time when ${y\in W^u_x}$.
2. There is ${\delta>0}$ such that if ${d(f^n x, f^ny) \leq \delta}$ for all ${n\geq 0}$, then ${y\in W^s_x}$; similarly for ${W^u_x}$ with ${n\leq 0}$.
3. There is ${\varepsilon>0}$ such that if ${d(x,y) \leq \varepsilon}$, then ${W_x^s \cap W_y^u}$ is a single point, which we denote ${[x,y]}$. Moreover, there is a constant ${Q}$ such that ${d([x,y],x) \leq Q d(x,y)}$, and similarly for ${d([x,y],y)}$.

A set ${R\subset M}$ is a rectangle if it has diameter ${\leq\varepsilon}$ and is closed under the bracket operation: in other words, for every ${x,y\in R}$, the intersection point ${[x,y] = W_x^s \cap W_y^u}$ exists and is contained in ${R}$.

Lemma 1 For every ${x\in M}$, there is a rectangle ${R}$ containing ${x}$.

Proof: Write ${V_x^s = \{y\in W_x^s : d(x,y) \leq \frac{\varepsilon}{2(Q+1)}\}}$ and similarly for ${V_x^u}$. Consider the set ${R = \{[y,z] : y\in V_x^u, z\in V_x^s\}}$ and observe that for every ${[y,z]\in R}$ we have

$\displaystyle d([y,z],x) \leq d([y,z],y) + d(y,x) \leq (Q+1)d(y,x) \leq \tfrac\varepsilon2.$

Thus ${R}$ has diameter ${\leq \varepsilon}$, and for every ${[y,z], [y',z']\in R}$ we have

$\displaystyle [[y,z],[y',z']] = W_{[y,z]}^s \cap W_{[y',z']}^u = W^s_y \cap W^u_{z'} = [y,z'] \in R,$

so ${R}$ is indeed a rectangle. $\Box$

Given ${x\in M}$ and a rectangle ${R\ni x}$, let ${V^{s,u}_x = W^{s,u}_x \cap R}$. Then ${R}$ can be recovered from ${V^{s,u}_x}$ via the method in the proof above, as the image of the map ${[\cdot,\cdot] \colon V_x^u \times V_x^s \rightarrow M}$. Given measures ${\nu^{s,u}}$ on ${V^{s,u}_x}$, let ${\nu^u\otimes\nu^s}$ be the pushforward of ${\nu^s\times \nu^u}$ under this map; that is, for every pair of Borel sets ${A\subset V_x^u}$ and ${B\subset V_x^s}$, we put ${(\nu^u\otimes \nu^s)(\{[y,z] : y\in A, z\in B\}) = \nu^u(A)\nu^s(B)}$.

Definition 2 A measure ${\mu}$ has local product structure with respect to ${W^{s,u}}$ if for every ${x\in M}$ and rectangle ${R\ni x}$ there are measures ${\nu^{s,u}}$ on ${V^{s,u}_x}$ such that ${\mu|_R \ll \nu^u \otimes \nu^s}$.

Theorem 3 Let ${f\colon M\rightarrow M}$ be a ${C^1}$ Anosov diffeomorphism, ${\varphi\colon M\rightarrow {\mathbb R}}$ a Hölder continuous function, and ${\mu}$ an ${f}$-invariant Borel probability measure on ${M}$ satisfying the Gibbs property (1) for some ${P\in {\mathbb R}}$. Then ${\mu}$ has local product structure in the sense of Definition 2. Moreover, there is ${\bar K \geq 1}$ such that for all ${R}$, the measures ${\nu^{s,u}}$ can be chosen so that the Radon–Nikodym derivative ${\psi = \frac{d\mu}{d(\nu^u\otimes \nu^s)}}$ satisfies ${\bar K^{-1} \leq \psi \leq \bar K}$ at ${\mu}$-a.e. point.

A quick side remark: here the diffeomorphism is only required to be ${C^1}$. The reason for the ${C^{1+\alpha}}$ hypothesis at the beginning of this post was so that the geometric potential ${\varphi(x) = -\log |\det Df|_{E^u_x}|}$ is Hölder continuous and its unique equilibrium state (the absolutely continuous invariant measure if it exists, or more generally the SRB measure) has the Gibbs property; this may not be true if ${f}$ is only ${C^1}$.

2. Conditional measures

In order to prove Theorem 3, we must start by recalling the notion of conditional measures; see Coudène’s book (especially Chapters 14 and 15) or Viana’s notes for more details than what is provided here.

Let ${(X,\mathop{\mathcal A},\mu)}$ be a Lebesgue space. A partition of ${X}$ is a map ${\xi \colon X\rightarrow \mathop{\mathcal A}}$ such that for ${\mu}$-a.e. ${x,y\in X}$, the sets ${\xi(x)}$ and ${\xi(y)}$ either coincide or are disjoint. Write ${\Xi = \{\xi(x) : x\in X\}}$ for the set of partition elements, and say that the partition ${\xi}$ is finite if ${\Xi}$ is finite.

Given a finite partition ${\xi}$, it is easy to define conditional measures ${\mu_{\xi(x)}}$ on the set ${\xi(x)}$ for ${\mu}$-a.e. ${x}$ by writing

$\displaystyle \mu_{\xi(x)}(A) = \frac{\mu(A \cap \xi(x))}{\mu(\xi(x))} \ \ \ \ \ (2)$

when ${\mu(\xi(x))>0}$, and ignoring those partition elements with zero measure. One can recover the measure ${\mu}$ from its conditional measures by the formula

$\displaystyle \mu(A) = \sum_{C\in \Xi} \mu_C(A) \mu(C). \ \ \ \ \ (3)$

If we write ${\hat\mu}$ for the measure on ${\Xi}$ defined by putting ${\hat\mu(\{C\}) = \mu(C)}$ for all ${C\in \Xi}$, then (3) can be written as

$\displaystyle \mu(A) = \int_\Xi \mu_C(A) \,d\hat\mu(C). \ \ \ \ \ (4)$

Even when the partition ${\xi}$ is infinite, one may still hope to obtain a formula along the lines of (4).

Example 1 Let ${X}$ be the unit square, ${\mu}$ be two-dimensional Lebesgue measure, and ${\xi(x)}$ be the horizontal line through ${x}$. Then Fubini’s theorem gives (4) by taking ${\mu_C}$ to be Lebesgue measure on horizontal lines, and defining ${\hat\mu}$ on ${\Xi}$, the set of horizontal lines in ${[0,1]^2}$, in one of the two following (equivalent) ways:

1. given ${E \subset \Xi}$, let ${\hat\mu(E) = \mu(\bigcup_{C\in E} C)}$;
2. identify ${\Xi}$ with the interval ${\{0\}\times [0,1]}$ on the ${y}$-axis, and define ${\hat\mu}$ as the image of one-dimensional Lebesgue measure on this interval.

Note that ${\hat\mu}$ must satisfy the first of these no matter what the partition ${\xi}$ is, while the second is a convenient description of ${\hat\mu}$ in this particular example.

A similar-looking example (and the one which is most relevant for our purposes) comes by letting ${R\subset M}$ be a rectangle and letting ${\xi(y) = V_y^u = W_y^u\cap R}$ be the partition into local unstable leaves. To produce conditional measures ${\mu_y^u = \mu_{V_y^u}}$, we need to use the fact that the partition ${\xi}$ is measurable. This means that there is a sequence of finite partitions ${\xi_n}$ that refines to ${\xi}$ in the sense that for ${\mu}$-a.e. ${y}$, we have

$\displaystyle \xi_{n+1}(y) \subset \xi_n(y) \text{ for all } n, \text{ and } \xi(y) = \textstyle\bigcap_{n=1}^\infty \xi_n(y).$

Lemma 4 Given any rectangle ${R}$, the partition of ${R}$ into local unstable leaves is measurable.

Proof: Fix ${x\in R}$ and let ${\{\eta_n(y)\}}$ be a refining sequence of finite partitions of ${V_x^s}$ with the property that ${\bigcap_n \eta_n(y) = \{y\}}$ for all ${y\in V_x^s}$; then let ${\xi_n(y) = \bigcup_{z\in \eta_n(y)} V_z^u}$, and we are done. $\Box$

Whenever ${\xi}$ is a measurable partition of a compact metric space ${X}$, we can define the conditional measures ${\mu_{\xi(y)}}$ as the limits of the conditional measures ${\mu_{\xi_n(y)}}$. Indeed, one can show (we omit the proofs) that for ${\mu}$-a.e. ${y}$, the limit ${I_y(\varphi) := \lim_{n\rightarrow\infty} \int \varphi \,d\mu_{\xi_n(y)}}$ exists for every continuous ${\varphi\colon X\rightarrow{\mathbb R}}$ and defines a continuous linear functional ${I_y\in C(X)^*}$; the corresponding measures ${\mu_{\xi(y)}}$ satisfy

$\displaystyle \int\varphi\,d\mu = \int_\Xi \int_C \varphi\,d\mu_C \,d\hat\mu(C) \ \ \ \ \ (5)$

for every ${\varphi\in C(X)}$, where once again we put ${\hat\mu(E) = \mu(\bigcup_{C\in E} C)}$.

The key result that we will need to describe properties of ${\mu_y^u}$ when ${\xi}$ is the partition of ${R}$ into local unstable leaves is that for ${\mu}$-a.e. ${y\in R}$ and every ${\varphi\in C(X)}$, we have

$\displaystyle \int \varphi\,d\mu_y^u = \lim_{n\rightarrow\infty} \frac 1{\mu(\xi_n(y))}\int_{\xi_n(y)} \varphi\,d\mu, \ \ \ \ \ (6)$

where ${\xi_n}$ is any sequence of finite partitions that refines to ${\xi}$.

In order to establish the local product structure for ${\mu}$ that is claimed by Theorem 3, we will show that the measures ${\mu_y^u}$ vary in an absolutely continuous manner as ${y}$ varies within ${R}$. That is, we consider for every ${x,y\in R}$ the holonomy map ${\pi_{x,y} \colon V_x^u \rightarrow V_{y}^u}$ defined by moving along local stable manifolds, so

$\displaystyle \pi_{x,y}(z) = [z,y] = V_z^s \cap V_{y}^u \text{ for all } z\in V_x^u.$

Our goal is to use the Gibbs property (1) for ${\mu}$ to prove that for every rectangle ${R}$ and ${\mu}$-a.e. ${x,y\in R}$, the conditional measures ${\mu_x^u}$ and ${\mu_{y}^u}$ satisfy

$\displaystyle (\pi_{x,y})_* \mu_x^u \ll \mu_{y}^u \text{ with } \bar{K}^{-1} \leq \frac{d(\pi_{x,y})_* \mu_x^u}{d\mu_{y}^u} \leq \bar{K}. \ \ \ \ \ (7)$

Once this is established, we can proceed as follows. Consider a rectangle ${R}$ with the decomposition ${\xi}$ into local unstable manifolds, let ${x\in R}$ be such that (7) holds for ${\mu}$-a.e. ${y\in R}$, and then identify ${\Xi}$ with ${V_x^s}$, as in the second characterization of ${\hat\mu}$ in Example 1. Let ${\nu^s}$ be the measure on ${V_x^s}$ corresponding to ${\hat\mu}$ under this identification, and let ${\nu^u = \mu_x^u}$. Given ${y\in V_x^s}$, let ${\psi_y = \frac{d(\pi_{x,y})_* \mu_x^u}{d\mu_{y}^u} \colon V_y^u \rightarrow [\bar{K}^{-1},\bar{K}]}$, so that in particular we have

$\displaystyle \int_{V_y^u} \varphi\,d\mu_y^u = \int_{V_y^u} \frac{\varphi}{\psi_y}\, d(\pi_{x,y})_*\mu_x^u = \int_{V_x^u} \frac{\varphi([z,y])}{\psi_y([z,y])} \,d\mu_x^u$

for every continuous ${\varphi\colon R\rightarrow {\mathbb R}}$. Then by (5), we have

\displaystyle \begin{aligned} \int\varphi\,d\mu &= \int_{\Xi} \int_C \varphi \,d\mu_C \,d\hat\mu(C) = \int_{V_x^s} \int_{V_y^u} \varphi \,d\mu_y^u \,d\nu^s(y) \\ &= \int_{V_x^s} \int_{V_x^u} \frac{\varphi([z,y])}{\psi_y([z,y])} \,d\nu^u(z) \,d\nu^s(y). \end{aligned}

By Definition 2, this shows that ${\mu}$ has local product structure with respect to ${W^{s,u}}$. Thus in order to prove Theorem 3, it suffices to shows that Gibbs measures satisfy the absolute continuity property (7).

It is worth noting quickly that our use of the term “absolute continuity” here has a rather different meaning from another common concept, which is that of a measure with “absolutely continuous conditional measures on unstable manifolds”. This latter notion is essential for the definition of SRB measures (indeed, in the Anosov setting it is the definition), and involves comparing ${\mu_x^u}$ to volume measure on ${V_x^u}$, instead of to the pushforwards of other conditional measures under holonomy.

In order to prove the absolute continuity property (7), we need to obtain estimates on ${\mu_y^u}$. We start by getting estimates on ${\mu}$ from the Gibbs property (1), and then using these to get estimates on ${\mu_y^u}$ using (6).

We will need a family of partitions of ${R}$ that refines to the partition into points. Fix a reference point ${q\in R}$, and suppose we have chosen partitions ${\eta_n^s}$ of ${V_q^s}$ and ${\eta_m^u}$ of ${V_q^u}$ for ${m,n\geq 0}$. Then we can define a partition ${\xi_{m,n}}$ of ${R}$ by taking the direct product of these two partitions, using the foliations of ${R}$ by local stable and unstable leaves: that is, we put

$\displaystyle \xi_{m,n}(y) = \{z\in R : \eta_m^u([z,q]) = \eta_m^u([y,q]) \text{ and } \eta_n^s([q,z]) = \eta_n^s([q,y]) \} \ \ \ \ \ (8)$

In order to obtain information on ${\mu(\xi_{m,n}(y))}$ using the Gibbs property (1), we need to put an extra condition on the partitions we use; we need to them to be adapted, meaning that each partition element both contains a Bowen ball and is contained within a larger Bowen ball. Most of the ideas here are fairly standard in thermodynamic formalism, but it is important for us to work separately on the stable and unstable manifolds, then combine the two, so we describe things explicitly. Fix ${\varepsilon>0}$. Given ${y\in V_q^u}$ and ${m\geq 0}$, let

\displaystyle \begin{aligned} B_m^u(y,\varepsilon) &= \{y'\in V_q^u : d(f^ky' , f^k y) \leq \varepsilon\ \forall 0\leq k \leq m \} \\ &= \{y'\in R : d(f^k y', f^ky) \leq \varepsilon\ \forall -\infty < k \leq m \}. \end{aligned}

Similarly, given ${z\in V_q^s}$ and ${n\geq 0}$, let

\displaystyle \begin{aligned} B_n^s(z,\varepsilon) &= \{z'\in V_q^s : d(f^{-k} z', f^{-k}z) \leq \varepsilon\ \forall 0\leq k \leq n\} \\ &= \{z'\in R : d(f^k z', f^kz) \leq \varepsilon\ \forall -n\leq k < \infty\}. \end{aligned}

Given ${\varepsilon > 0}$, we say that the partitions ${\eta_m^u}$ are ${\varepsilon}$-adapted if for every partition element ${\eta_m^u(y')}$ there is ${y\in V_q^u}$ such that

$\displaystyle B_m^u(y,\varepsilon/2) \subset \eta_m^u(y') \subset B_m^u(y,\varepsilon).$

We make a similar definition for ${\eta_n^s}$ using ${B_n^s}$. Note that we can produce an ${\varepsilon}$-adapted sequence of partitions ${\eta_n^s}$ as follows:

1. say that ${E\subset V_q^s}$ is ${(n,\varepsilon)}$-separated if for every ${x\neq y \in E}$ there is ${0\leq k\leq n}$ such that ${d(f^{-k}x,f^{-k}y) > \varepsilon}$;
2. let ${E \subset V_q^s}$ be a maximal ${(n,\varepsilon)}$-separated set and observe that ${\bigcup_{x\in E} B_n^s(x,\varepsilon) = V_q^s}$, while the sets ${\{B_n^s(x,\varepsilon/2) : x\in E \}}$ are disjoint;
3. enumerate ${E}$ as ${x_1,\dots, x_\ell}$, and build an ${\varepsilon}$-adapted partition ${\eta_n^s}$ by considering the sets

$\displaystyle C_j = B_n^s(x_j,\varepsilon/2) \cup \Big( V_q^s \setminus \bigcup_{i>j} B_n^s(x_i,\varepsilon)\Big). \ \ \ \ \ (9)$

Lemma 5 Let ${\mu}$ be a measure satisfying the Gibbs property (1). Then there are ${\varepsilon > 0}$ and ${K_0>0}$ such that if ${\eta_m^u}$ and ${\eta_n^s}$ are ${\varepsilon}$-adapted partitions of ${V_q^u}$ and ${V_q^s}$, then the product partition ${\xi_{m,n}}$ defined in (8) satisfies

$\displaystyle K_0^{-1} \leq \frac{\mu(\xi_{m,n}(x))}{e^{-(n+m)P + \sum_{k=-n}^{m-1} \varphi(f^k x)}} \leq K_0 \ \ \ \ \ (10)$

for every ${x\in R}$.

Proof: First we show that the upper bound in (10) holds whenever ${\varepsilon>0}$ is sufficiently small. Fix ${\varepsilon'>0}$ such that the Gibbs bound (1) holds for Bowen balls of radius ${2\varepsilon'}$, and such that ${\varepsilon'}$ is significantly smaller than the size of any local stable or unstable leaf. It suffices to show that for every ${x\in R}$ we have ${\xi_{m,n}(x) \subset B_{-n,m}(y,2\varepsilon') := f^{-n}(B_{n+m}(f^n y, 2\varepsilon'))}$ for some ${y}$, since then (1) gives the upper bound, possibly with a different constant; note that replacing ${x}$ with ${y}$ in the denominator changes the quantity by at most a constant factor, using the fact that ${\varphi}$ is Hölder continuous together with some basic properties of Anosov maps. (See, for example, Section 2 of this previous post for a proof of this Bowen property.)

Given ${x,y\in M}$ such that ${x}$ and ${y}$ lie on the same local stable leaf with ${d(x,y) \geq \varepsilon'}$, let

$\displaystyle r^u(x,y) = \inf\{d(x',y') : x'\in W_x^u, y'\in W_y^u \cap W_{x'}^s\}$

be the closest that any holonomy along local unstable leaves can bring ${x}$ and ${y}$. Note that ${r^u}$ is positive and continuous in ${x}$ and ${y}$; by compactness there is ${\bar{r}^u > 0}$ such that ${r^u(x,y) \geq \bar{r}^u}$ for all ${x,y}$ as above. In particular, this means that if ${x,y}$ are the images under an (unstable) holonomy of some ${x',y'}$ with ${d(x',y') < \bar{r}^u}$, then we must have ${d(x,y) <\varepsilon}$.

Choose ${\bar{r}^s}$ similarly for stable holonomies, and fix ${\varepsilon <\min(\bar{r}^s,\bar{r}^u)}$. Fix ${y\in V_q^u}$, ${z\in V_q^s}$, ${y'\in B_m^u(y,\varepsilon)}$, and ${z'\in B_n^s(z,\varepsilon)}$. Then for every ${-n\leq k\leq m}$ we have

$\displaystyle d(f^k[y',z'],f^k[y,z]) \leq d(f^k[y',z'],f^k[y',z]) + d(f^k[y',z],f^k[y,z]). \ \ \ \ \ (11)$

These distances can be estimated by observing that ${f^k[y',z']}$ and ${f^k[y',z]}$ are the images of ${f^k z'}$ and ${f^kz}$ under a holonomy map along local unstables, and similarly ${f^k[y',z]}$ and ${f^k[y,z]}$ are images of ${y}$ and ${y'}$ under a holonomy map along local stables. Then our choice of ${\varepsilon}$ shows that both quantities in the right-hand side of (11) are ${\leq \varepsilon'}$, which gives the inclusion we needed. This proves the upper bound in (10); the proof of the lower bound is similar. $\Box$

4. A refining sequence of adapted partitions

Armed with the formula from Lemma 5, we may look at the characterization of conditional measures in (6) and try to prove that the absolute continuity bound (7) holds whenever ${x,y\in R}$ both satisfy (6). There is one problem before we do this, though; the formula in (6) requires that the sequence of partitions ${\eta_n^s}$ be refining, and there is no a priori reason to expect that the adapted partitions produced by the simple argument before Lemma 5 refine each other. To get this additional property, we must do some more work. To the best of my knowledge, the arguments here are new.

4.1. Strategy

Start by letting ${E_n \subset V_q^s}$ be a maximal ${(n, 10\varepsilon)}$-separated set. We want to build a refining sequence of adapted partitions using the sets ${E_n}$, where instead of using Bowen balls with radius ${\varepsilon/2}$ and ${\varepsilon}$ we will use Bowen balls with radius ${\varepsilon}$ and ${20\varepsilon}$; this does not change anything essential about the previous section. We cannot immediately proceed as in the argument before Lemma 5, because if we are not careful when choosing the elements of the partition ${\eta_n^s}$, their boundaries might cut the Bowen balls ${B_{n'}^s(x,\varepsilon)}$ for ${x\in E_{n'}}$ and ${n' > n}$, spoiling our attempt to build an adapted partition ${\eta_{n'}^s}$ that refines ${\eta_n^s}$.

A first step will be to only build ${\eta_n^s}$ for some values of ${n}$. More precisely, we fix ${\ell\in {\mathbb N}}$ such that ${C\lambda^\ell < \frac 13}$, so that for every ${x\in M}$ and ${y\in W_x^s}$, we have ${d(f^\ell x, f^\ell y) < \frac 13 d(x,y)}$; then writing

$\displaystyle d_n^s(x,y) := \max_{0\leq k \leq n} d(f^{-k}x, f^{-k}y), \ \ \ \ \ (12)$

we have the following for every ${n,t\in {\mathbb N}}$:

$\displaystyle d_n^s(x,y) \leq 3^{-t} d_{n+t\ell}^s(x,y) \text{ whenever } d_{n+t\ell}^s(x,y) < \varepsilon. \ \ \ \ \ (13)$

We will only build adapted partitions for those ${n}$ that are multiples of ${\ell}$. We need to understand when the Bowen balls associated to points in the sets ${E_n}$ can overlap. We write ${\mathcal{E} = \{(x,n) \in V_q^s \times \ell{\mathbb N} : x\in E_n \}}$.

Definition 6 An ${\varepsilon}$-path between ${(x,n)\in \mathcal{E}}$ and ${(x',n')\in \mathcal{E}}$ is a sequence ${\{(z_i, k_i) \in \mathcal{E} : 0\leq i\leq m \}}$ with ${(z_0,k_0) = (x,n)}$ and ${(z_m,k_m) = (x',n')}$ such that ${B_{k_i}^s(z_i,\varepsilon) \cap B_{k_{i+1}}^s(z_{i+1},\varepsilon) \neq \emptyset}$ for all ${0\leq i < m}$. A subset ${\mathcal{C} \subset \mathcal{E}}$ is ${\varepsilon}$-connected if there is an ${\varepsilon}$-path between any two elements of ${\mathcal{C}}$.

Given ${n\in {\mathbb N}}$, let ${\mathcal{E}_{\geq n} = \{(x,k)\in \mathcal{E} : k\geq n\}}$. In the next section we will prove the following.

Proposition 7 If ${\mathcal{C} \subset \mathcal{E}_{\geq n}}$ is ${\varepsilon}$-connected and ${(x,n)\in \mathcal{C}}$, then

$\displaystyle U(\mathcal{C}) := \bigcup_{(y,k) \in \mathcal{C}} B_k^s(y,\varepsilon) \subset B_n^s(x,5\varepsilon). \ \ \ \ \ (14)$

For now we show how to build a refining sequence of adapted partitions assuming (14) by modifying the construction in (9). The key is that we build our partitions ${\eta_n^s}$ so that for every ${\varepsilon}$-connected ${\mathcal{C} \subset \mathcal{E}_{\geq n}}$, the set ${U(\mathcal{C})}$ is completely contained in a single element of ${\eta_n^s}$.

Suppose we have built a partition ${\eta_{n-\ell}^s}$ with this property; we need to construct a partition ${\eta_n^s}$ that refines ${\eta_{n-\ell}^s}$, still has this property, and also has the property that every partition element ${A}$ has some ${(x,n)\in \mathcal{E}}$ such that ${B_n^s(x,\varepsilon) \subset A \subset B_n^s(x,20\varepsilon)}$. To this end, let ${Z}$ be an element of the partition ${\eta_{n-\ell}^s}$, and enumerate the ${\varepsilon}$-connected components of ${\mathcal{E}_{\geq n}}$ as ${\mathcal{C}_1,\dots, \mathcal{C}_m, \mathcal{D}_1,\mathcal{D}_2,\dots}$, where each ${\mathcal{C}_j}$ contains some ${(x_j,n)}$, while each ${\mathcal{D}_i}$ is in fact a subset of ${\mathcal{E}_{\geq n+\ell}}$.

It follows from (14) that ${U(\mathcal{C}_j) \subset B_n^s(x_j,5\varepsilon)}$ for all ${1\leq j\leq m}$. Given ${i\in {\mathbb N}}$, we can observe that ${U(\mathcal{D}_i) \subset B_{n_i}^s(y_i,5\varepsilon) \subset B_n^s(y_i,5\varepsilon)}$ for some ${(y_i,n_i)\in \mathcal{E}}$ with ${n_i > n}$. Moreover, the sets ${\{B_n^s(x_j,10\varepsilon) : 1\leq j\leq m\}}$ cover ${Z}$, so every ${U(\mathcal{D}_i)}$ must intersect some set ${B_n^s(x_j,10\varepsilon)}$, and hence be contained in some ${B_n^s(x_j,20\varepsilon)}$. Given ${i\in {\mathbb N}}$, let ${J(i) = \min \{j : U(\mathcal{D}_i) \subset B_n(x_i,20\varepsilon)\}}$. Then for each ${1\leq j\leq m}$, let ${I_j = \{i\in {\mathbb N} : J(i) = j\}}$. Define sets ${A_1,\dots, A_m}$ by

$\displaystyle A_j = U(\mathcal{C}_j) \cup \Big( \bigcup_{i\in I_j} U(\mathcal{D}_i) \Big) \cup \Big( Z\setminus \bigcup_{j' > j} B_n(x_{j'},20\varepsilon) \Big).$

It is not hard to verify that the sets ${A_j}$ form a partition of ${Z}$ such that ${B_n^s(x_j,\varepsilon) \subset A_j \subset B_n^s(x_j,20\varepsilon)}$, and such that every ${\varepsilon}$-connected component ${\mathcal{C}}$ of ${\mathcal{E}_{\geq n}}$ that has ${U(\mathcal{C}) \cap Z \neq\emptyset}$ is completely contained in some ${A_j}$. Repeating this procedure for the other elements of ${\eta_{n-\ell}^s}$ produces the desired ${\eta_n^s}$.

4.2. Proof of the proposition

Now we must prove Proposition 7. Let ${\mathcal{E} \subset V_q^s\times \ell{\mathbb N}}$ be as in the previous section, so that in particular, every ${(x,n)\in \mathcal{E}}$ has that ${n}$ is a multiple of ${\ell}$, and given any ${(x,n)\neq (y,n)\in \mathcal{E}}$ we have ${d_n^s(x,y) \geq 10\varepsilon}$.

The following concept is essential for our proof.

Definition 8 Given an ${\varepsilon}$-path ${\{(z_i,k_i)\}_{i=0}^m \subset \mathcal{E}}$, a ford is a pair ${0\leq i k_i}$ for all ${i. Think of “fording a river'' to get from ${B_{k_i}^s(z_i,\varepsilon)}$ to ${B_{k_j}^s(z_j,\varepsilon)}$ by going through deeper levels of the ${\varepsilon}$-path; the alternative is that ${k_a \leq k_i}$ for some ${i, in which case ${B_{k_a}^s(z_a,\varepsilon)}$ is a sort of “bridge'' between ${z_i}$ and ${z_j}$.

Lemma 9 Suppose that ${\{(z_i,k_i)\}_{i=0}^m \subset \mathcal{E}}$ is an ${\varepsilon}$-path without any fords, and that ${k_i \geq k_0}$ for all ${1\leq i \leq m}$. Then ${d_{k_0}^s(z_0,z_m) \leq 4\varepsilon}$.

Proof: By the definition of ${\varepsilon}$-path, for every ${0\leq a < m}$ there is a point ${y_a \in B_{k_a}^s(z_a,\varepsilon) \cap B_{k_{a+1}}^s(z_{a+1},\varepsilon)}$. Thus

\displaystyle \begin{aligned} d_{k_0}^s(z_0,z_m) &\leq d_{k_0}^s(z_0,y_0) + \Big( \sum_{a=1}^{m-1} d_{k_0}^s(y_{a-1}, z_a) + d_{k_0}^s(z_a,y_a) \Big) + d_{k_0}^s(y_{m-1},z_m) \\ &\leq 2\varepsilon + \sum_{a=1}^{m-1} (\tfrac 13)^{k_a - k} \big( d_{k_a}^s(y_{a-1}, z_a) + d_{k_a}^s(z_a,y_a) \big) \\ &\leq 2\varepsilon \Big( 1 + \sum_{a=i+1}^{j-1} (\tfrac13)^{k_a - k} \Big), \end{aligned} \ \ \ \ \ (15)

where the second inequality uses (13). Now we need to estimate how often different values of ${k_a}$ can appear. Let ${A = \{1, \dots, m-1\}}$; we claim that for every ${t\in {\mathbb N}}$, we have

$\displaystyle \#\{a\in A : k_a = k_0 + t\ell \} \leq 2^{t-1}. \ \ \ \ \ (16)$

First note that ${k_a > k_0}$ for all ${a\in A}$, because otherwise ${(0,a)}$ would be a ford. Since the ${\varepsilon}$-path has no fords, every ${i with ${k_i = k_j}$ must be separated by some ${a}$ with ${k_a < k_i}$, and we conclude that for every ${t\in {\mathbb N}}$, we have

\displaystyle \begin{aligned} \#\{a \in A : &\, k_a = k_0 + t\ell\} \leq 1 + \#\{a\in A : k_0 < k_a < k_0 + t\ell\} \\ &= 1 + \sum_{q=1}^{t-1} \#\{a\in A : k_a = k_0 + q\ell\}. \end{aligned}

For ${t=1}$, this establishes (16) immediately. If (16) holds up to ${t-1}$, then this gives

$\displaystyle \#\{a\in A : k_a = k_0 + t\ell\} \leq 1 + \sum_{q=1}^{t-1} 2^{q-1} = 2^{t-1},$

which proves (16) for all ${t\in {\mathbb N}}$ by induction. Combining (15) and (16), we have

\displaystyle \begin{aligned} d_k^s(z_i,z_j) &\leq 2\varepsilon \Big( 1 + \sum_{t=1}^\infty \Big(\frac13 \Big)^t \#\{a\in A : k_a = k_i + t\ell\}\Big) \\ &\leq 2\varepsilon + \varepsilon \sum_{t=1}^\infty \Big(\frac23 \Big)^t =4\varepsilon, \end{aligned}

which proves Lemma 9. $\Box$

Now we show that the ${10\varepsilon}$-separation condition rules out the existence of fords entirely.

Lemma 10 If ${\{(z_i,k_i)\}_{i=0}^m \subset \mathcal{E}}$ is an ${\varepsilon}$-path, then it has no fords.

Proof: Fix an ${\varepsilon}$-path ${\{(z_i,k_i)\}_{i=0}^m}$. Denote the set of fords by

$\displaystyle I = \{(i,j)\in \{0,1,\dots,m\}^2 : i < j \text{ and } k_i = k_j \leq k_a \text{ for all } i

our goal is to prove that ${I}$ is empty. Put a partial ordering on ${I}$ by writing

$\displaystyle (i',j') \preceq (i,j)\ \Leftrightarrow\ [i',j'] \subset [i,j] \ \Leftrightarrow\ i \leq i' < j' \leq j.$

If ${I}$ is non-empty, then since it is finite it must contain some element ${(i,j)}$ that is minimal with respect to this partial ordering. In particular, the ${\varepsilon}$-path ${(z_i,k_i),\dots, (z_j,k_j)}$ contains no fords, and so by Lemma 9 we have ${d_{k_i}^s(z_i,z_j) \leq 4\varepsilon}$, contradicting the assumption that ${d_{k_i}^s(z_i,z_j) \geq 10\varepsilon}$ (since ${E_k}$ is ${(k,10\varepsilon)}$-separated), and we conclude that ${I}$ must be empty, proving the lemma. $\Box$

Now we prove Proposition 7. Let ${\mathcal{C} \subset \mathcal{E}_{\geq n}}$ be ${\varepsilon}$-connected and fix ${(x,n) \in \mathcal{C}}$. Given any ${(y,k) \in \mathcal{C}}$, there is an ${\varepsilon}$-path ${\{(z_i,k_i)\}_{i=0}^m}$ such that ${(z_0,k_0) = (x,n)}$ and ${(z_m, k_m) = (y,k)}$. By Lemma 10, this path has no fords, and so Lemma 9 gives ${d_n^s(x,y) \leq 4\varepsilon}$. We conclude that ${B_n^s(y,\varepsilon) \subset B_n^s(x,5\varepsilon)}$, which proves Proposition 7.

5. Completion of the proof

Thanks to the previous section, we can let ${\eta_m^u}$ and ${\eta_n^s}$ be ${\varepsilon}$-adapted partitions of ${V_q^u}$ and ${V_q^s}$ such that ${\eta_{n'}^s}$ refines ${\eta_n^s}$ whenever ${n'>n}$ and ${n,n'}$ are both multiples of ${\ell}$. By Lemma 5, we have good lower and upper estimates on ${\mu(\xi_{m,n}(x))}$ for all ${x\in R}$, where ${\xi_{m,n}}$ is the product partition defined in (8). By (6), there is ${R'\subset R}$ with ${\mu(R\setminus R')=0}$ such that for every ${x\in R'}$, we have

$\displaystyle \int \psi\,d\mu_x^u = \lim_{n\rightarrow\infty} \frac 1{\mu(\xi_{0,n\ell}(x))}\int_{\xi_{0,n\ell}(x)} \psi\,d\mu \ \ \ \ \ (17)$

for every continuous ${\psi\colon M\rightarrow {\mathbb R}}$. The next step towards proving Theorem 3 is the following result.

Proposition 11 There is a constant ${K_1}$ such that for any ${x,y\in R'}$ and every continuous ${\psi\colon V_y^u \rightarrow (0,\infty)}$, we have ${\int \psi\,d(\pi_{x,y})_*\mu_x^u \leq K_1 \int\psi\,d\mu_y^u}$, where ${\pi_{x,y} \colon V_x^u \rightarrow V_y^u}$ is the holonomy map along local unstables.

Proof: Start by fixing for each ${m\in {\mathbb N}}$ a set ${D_m^q \subset V_q^u}$ such that every element of ${\eta_m^s}$ contains exactly one point in ${D_m^q}$. Then let ${D_m^x = \pi_{q,x}D_m^q}$ for every ${x\in R}$.

Given a positive continuous function ${\psi}$, there is ${\delta>0}$ such that if ${d(z,z')<\delta}$, then ${\psi(z) \leq 2\psi(z')}$. Thus for all sufficiently large ${m,n}$, we have ${\psi(z) \leq 2\psi(z')}$ whenever ${\xi_{m,n\ell}(z) = \xi_{m,n\ell}(z')}$. For any such ${m,n}$ and any ${x,y\in R'}$, we conclude that

\displaystyle \begin{aligned} \int_{\xi_{0,n\ell}(x)} \psi\circ \pi_{x,y}\,d\mu &= \sum_{z\in D_m^x} \int_{\xi_{m,n\ell}(z)} \psi\circ \pi_{x,y}\,d\mu \leq \sum_{z\in D_{m}^x} 2\psi(\pi_{x,y}z) \mu(\xi_{m,n\ell}(z)) \\ &\leq \sum_{z\in D_{m}^x} 2\psi(\pi_{x,y}z) K_0 e^{-(n+m)P + S_{n,m}\varphi(z)}, \end{aligned}

where we write ${S_{n,m}\varphi(z) = \sum_{k=-n\ell}^{m-1} \varphi(f^kz)}$ and where the last inequality uses Lemma 5. Similarly,

$\displaystyle \mu(\xi_{0,n\ell}(x)) = \sum_{z\in D_m^x} \mu(\xi_{m,n\ell}(z)) \geq K_0^{-1} e^{-(n+m) P + S_{n,m}\varphi(f^kz)},$

and we conclude from (6) that

$\displaystyle \int \psi\circ \pi_{x,y} \,d\mu_x^u \leq \varlimsup_{n\rightarrow\infty} \frac{\sum_{z\in D_m^x} 2\psi(\pi_{x,y}z) K_0 e^{S_{n,m}\varphi(z)}}{\sum_{z\in D_m^x} K_0^{-1} e^{S_{n,m}\varphi(z)}}. \ \ \ \ \ (18)$

Note that since ${\varphi}$ is Hölder continuous, there are ${\beta>0}$ and ${|\varphi|_\beta > 0}$ such that ${|\varphi(z)-\varphi(z')| \leq |\varphi|_\beta d(z,z')^\beta}$ for all ${z,z'\in M}$. Thus given any ${z\in V_x^u}$ and any ${n\geq 0}$, we have

\displaystyle \begin{aligned} \Big|\sum_{k=-n\ell}^{-1} \varphi(f^kx) &- \sum_{k=-n\ell}^{-1} \varphi(f^kz)\Big| \leq \sum_{k=1}^{n\ell} |\varphi(f^kx) - \varphi(f^kz)| \\ &\leq \sum_{k=1}^{n\ell} |\varphi|_\beta (C\lambda^k \varepsilon)^\beta < |\varphi|_\beta (C\varepsilon)^\beta \sum_{k=1}^\infty (\lambda^\beta)^k =: L < \infty. \end{aligned} \ \ \ \ \ (19)

Thus (18) gives

\displaystyle \begin{aligned} \int \psi \,d(\pi_{x,y})_*\mu_x^u &\leq 2K_0^2 \varlimsup_{n\rightarrow\infty} \frac{\sum_{z\in D_m^x} \psi(\pi_{x,y}z) e^{L \sum_{k=-n\ell}^{-1} \varphi(f^k x)} e^{S_m\varphi(z)}} {\sum_{z\in D_m^x} e^{L^{-1} \sum_{k=-n\ell}^{-1} \varphi(f^k x)} e^{S_m\varphi(z)}} \\ &\leq 2K_0^2 e^{2L} \frac{\sum_{z\in D_m^x} \psi(\pi_{x,y}z) e^{S_m\varphi(z)}}{\sum_{z\in D_m^x} e^{S_m\varphi(z)}}. \end{aligned} \ \ \ \ \ (20)

A similar set of computations for ${\mu_y^u}$ shows that

$\displaystyle \int\psi\,d\mu_y^u \geq \frac 1{2K_0^2 e^{2L}} \frac{\sum_{z'\in D_m^y} \psi(z') e^{S_m\varphi(z')}}{\sum_{z'\in D_m^y} e^{S_m\varphi(z')}}. \ \ \ \ \ (21)$

Since ${D_m^y = \pi_{x,y} D_m^x}$, we can rewrite the sums over ${D_m^y}$ as sums over ${D_m^x}$; for example,

$\displaystyle \sum_{z' \in D_m^y} e^{S_{m}\varphi(z')} = \sum_{z\in D_m^x} e^{S_{m}\varphi(\pi_{x,y} z)} \leq e^L \sum_{z\in D_m^x} e^{S_m\varphi(z)},$

where the inequality uses the fact that the estimate (19) also holds for forward ergodic averages of two points on the same local stable manifold. Using a similar estimate for the numerator in (21) gives

$\displaystyle \int \psi\,d\mu_y^u \geq \frac 1{2K_0^2 e^{4L}} \frac{\sum_{z\in D_m^x} \psi(\pi_{x,y}z) e^{S_m\varphi(z)}}{\sum_{z\in D_m^x} e^{S_m\varphi(z)}}.$

Together with (20), this gives

$\displaystyle \int \psi \,d(\pi_{x,y})_*\mu_x^u \leq 4K_0^4 e^{6L} \int\psi\,d\mu_y^u,$

which completes the proof of Proposition 11. $\Box$

To complete the proof of Theorem 3, we first observe that for every open set ${U \subset V_y^u}$, there is a sequence of continuous functions ${\psi_n\colon V_y^u \rightarrow (0,1]}$ that converge pointwise to the indicator function ${\mathbf{1}_U}$; applying Proposition 11 to these functions and using the dominated convergence theorem gives

\displaystyle \begin{aligned} d(\pi_{x,y})_*\mu_x^u(U) &= \int \mathbf{1}_U \,d(\pi_{x,y})_*\mu_x^u = \lim_{n\rightarrow\infty} \int \psi_n \,d(\pi_{x,y})_*\mu_x^u \\ &\leq \lim_{n\rightarrow\infty} K_1 \int\psi_n \,d\mu_y^u = K_1 \int \mathbf{1}_U \,d\mu_y^u = K_1\mu_y^u(U). \end{aligned}

Then for every measurable ${E \subset V_y^u}$, we have

\displaystyle \begin{aligned} d(\pi_{x,y})_*\mu_x^u(E) &= \inf \{ d(\pi_{x,y})_*\mu_x^u(U) : U \supset E \text{ is open} \} \\ &\leq K_1 \inf \{ \mu_y^u(U) : U \supset E \text{ is open} \} = K_1 \mu_y^u(E). \end{aligned}

This proves that ${d(\pi_{x,y})_*\mu_x^u \ll \mu_y^u}$ and that the Radon–Nikodym derivative is ${\leq K_1}$ ${\mu_y^u}$-a.e. The lower bound ${K_1^{-1}}$ follows since the argument is symmetric in ${x}$ and ${y}$. This proves (7) and thus completes the proof of Theorem 3.

## Alpha-beta shifts

Given ${\beta>1}$, the interval map ${T\colon [0,1]\rightarrow [0,1]}$ given by ${T(x) = \beta x \pmod 1}$ is naturally semi-conjugate to the ${\beta}$-shift ${\Sigma_\beta \subset \{0,1,\dots,\lceil\beta\rceil-1\}^{\mathbb N}}$. The shift space ${\Sigma_\beta}$ admits a natural description in terms of the lexicographic order, and another in terms of a countable-state directed graph. These have been used to obtain a fairly complete description of the dynamics of the ${\beta}$-transformation ${T}$, including existence, uniqueness, and strong statistical properties of equilibrium states for Hölder potentials.

In this post I want to explain how one can give similar descriptions (both in terms of lexicographic order and in terms of a countable-state Markov structure) for the coding spaces associated to the interval map ${T(x) = \alpha + \beta x \pmod 1}$. Just as with the ${\beta}$-shifts, these “${\alpha}$${\beta}$ shifts” arise from piecewise expanding interval maps, and thus can be studied using the general machinery developed by Hofbauer, but I believe that they are worth studying a little more carefully as an intermediate case between the ${\beta}$-shifts, where the fact that ${0}$ is a `safe symbol’ can be used to prove various results that are still open for ${\alpha}$${\beta}$ shifts, and the more general setting where the description of the shift space requires more bookkeeping and it can be harder to see what is going on.

1. Coding spaces for interval maps

We start by recalling the general notion of a coding space for a map of the interval. Say that a map ${T\colon [0,1]\rightarrow [0,1]}$ is a piecewise expanding interval map if there is ${\lambda>1}$ and a partition of ${[0,1]}$ into finitely many intervals ${I_0,\dots, I_{d-1}}$ such that ${T}$ is continuous on each ${I_j}$ and ${C^1}$ on the interior of each ${I_j}$, with ${|T'| \geq \lambda}$. Note that we do not care whether the intervals are closed, open, or half of each.

Let ${A = \{0,1,\dots, d-1\}}$, and define a map ${h\colon [0,1]\rightarrow A^{\mathbb N}}$ by ${h(x)_n = a \in A}$ whenever ${T^n(x) \in I_a}$. Let ${\Sigma = \Sigma_T = \overline{h([0,1])}}$; we say that ${\Sigma}$ is the coding space for the interval map ${T}$ relative to the partition ${\{I_j\}_j}$. We can define a map ${\pi\colon \Sigma\rightarrow [0,1]}$ by the condition that ${\pi(h(x)) = x}$ for all ${x\in [0,1]}$ and ${\pi}$ is continuous. Then we have ${\pi\circ \sigma = T\circ \pi}$, so ${\pi}$ is a semiconjugacy between ${(\Sigma,\sigma)}$ and ${([0,1],T)}$.

Another way of interpreting this is the following: for each ${a\in A}$ there is a unique map ${S_a \colon \overline{T(I_a)} \rightarrow \overline{I_a}}$ such that ${S_a(T x) = x}$ for all ${x}$ in the interior of ${I_a}$. Note that ${S_a}$ is ${C^1}$ with ${|S_a'| \leq \lambda^{-1} < 1}$. Given any set ${E\subset [0,1]}$, write ${S_a(E) = S_a(E \cap \overline{T(I_a)})}$ for convenience; note that ${S_a(E)=\emptyset}$ if ${E}$ does not intersect ${\overline{T(I_a)}}$. Given a finite word ${w\in A^* = \bigcup_{n\geq 0} A^n}$, let ${J_w = S_w([0,1])}$, where ${S_w = S_{w_1} \circ \cdots S_{w_{|w|}}}$ and ${|w|}$ is the length of ${w}$. Roughly speaking, ${J_w}$ represents the set of points ${x\in [0,1]}$ for which ${x\in I_{w_1}}$, ${T(x) \in I_{w_2}}$, and so on. (This is not quite completely true because of problems at the endpoints of the intervals.)

The language of the shift ${\Sigma}$ is the set of all ${w\in A^*}$ such that ${[w] := \{x\in \Sigma : x_1 \cdots x_{|w|} = w\} \neq\emptyset}$. Write ${\mathcal{L}}$ for this collection of words; then ${\mathcal{L} = \{ w \in A^* : J_w \neq \emptyset\}}$, and we have ${\pi([w]) = J_w}$ for all ${w\in \mathcal{L}}$. Let ${\mathop{\mathcal L}_n = \{w\in \mathop{\mathcal L} : |w| = n\}}$; given ${w\in \mathop{\mathcal L}_n}$, the interval ${J_w}$ has length ${\leq \lambda^{-n}}$, and since for every ${x\in \Sigma}$ we have ${\{x\} = \bigcap_{n\geq 0} [x_1\cdots x_n]}$, we also have ${\{\pi(x)\} = \bigcap_{n\geq 0} J_{x_1 \cdots x_n}}$.

So far these considerations have been completely general, valid for any piecewise expanding interval map. Now we consider the specific transformations ${T_\beta(x) = \beta x \pmod 1}$ and ${T_{\alpha,\beta} = \alpha + \beta x \pmod 1}$, where ${z\pmod 1 := z - \lfloor z \rfloor}$ and we assume without loss of generality that ${\alpha \in [0,1)}$. These are piecewise expanding interval maps, where the natural partition to consider is the partition into maximal intervals of continuity. Thus for ${T_\beta}$, we have ${I_0 = [0,\frac 1\beta)}$, ${I_1 = [\frac 1\beta, \frac 2\beta)}$, and so on, with ${I_{d-1} = [\frac {d-1}\beta,1]}$, where ${d = \lceil\beta\rceil-1}$. For ${T_{\alpha,\beta}}$, we have ${I_0 = [0,\frac{1-\alpha}\beta)}$, ${I_1 = [\frac{1-\alpha}{\beta},\frac{2-\alpha}\beta)}$, and so on, with ${d = \lceil\alpha+\beta\rceil - 1}$ and ${I_{d-1} = [\frac{d-1+\alpha}{\beta},1]}$. We write ${\Sigma_\beta}$ and ${\Sigma_{\alpha,\beta}}$ for the coding spaces of these transformations relative to their natural partitions.

2. Description via lexicographic order

The full shift ${A^{\mathbb N}}$ carries a natural lexicographic order inherited from the usual order on ${A = \{0,1,\dots, d-1\}}$: given ${y\neq z\in A^{\mathbb N}}$, let ${n=n(y,z)}$ be minimal such that ${y\neq z}$, and write ${y\prec z}$ if ${y_n < z_n}$. Write ${y\preceq z}$ if ${y\prec z}$ or ${y=z}$. This is a total order on ${A^{\mathbb N}}$. The key observation for our purposes is that because ${T_\beta}$ and ${T_{\alpha,\beta}}$ are order-preserving on each of their intervals of continuity, their coding maps are also order-preserving in the sense that ${\pi(y) \leq \pi(z)}$ if and only if ${y\preceq z}$. (Note that we must use non-strict inequalities because the coding maps are not 1-1.)

2.1. The ${\beta}$-shifts

Fix ${\beta>1}$ and let ${T=T_\beta}$. Let ${{\mathbf b} = \sup \Sigma_\beta}$, where supremum is w.r.t. lexicographic order; the supremum exists because ${A^{\mathbb N}}$ has the least upper bound property, and is in ${\Sigma_\beta}$ because the ${\beta}$-shift is closed. Thus ${{\mathbf b} = b_1 b_2 b_3 \cdots}$ is the lexicographically maximal element of ${\Sigma_\beta}$; it would code the trajectory of the right endpoint of the unit interval if we replaced ${T_\beta}$ with the map ${T(x) = 1 + \beta x - \lceil \beta x \rceil}$, which agrees with ${T_\beta}$ everywhere except at the endpoints of the intervals of monotonicity. The points ${T^n(1) \in [0,1]}$ will play an important role in the following result.

Proposition 1 A sequence ${y\in A^{\mathbb N}}$ is in ${\Sigma_\beta}$ if and only if

$\displaystyle \sigma^n(y) \preceq {\mathbf b} \text{ for all } n=0,1,2,\dots. \ \ \ \ \ (1)$

Similarly, a word ${w\in A^*}$ is in ${\mathop{\mathcal L}(\Sigma_\beta)}$ if and only if

$\displaystyle w_{|w|-k+1}\cdots w_{|w|} \preceq b_1 \cdots b_k \text{ for all } 1\leq k\leq |w|. \ \ \ \ \ (2)$

Proof: Because ${\Sigma_\beta}$ is closed and ${\sigma}$-invariant, every ${y\in \Sigma_\beta}$ satisfies (1), and it follows immediately that every ${w\in \mathop{\mathcal L}(\Sigma_\beta)}$ satisfies (2).

To prove the converse direction, it suffices to show that every ${w\in A^*}$ which satisfies (2) is contained in ${\mathop{\mathcal L}}$; in other words, that ${J_w \neq \emptyset}$ for every such ${w}$. Equivalently, we can work with the following object: consider for each ${w \in A^*}$ the follower interval ${F_w = \overline{T^{|w|} (J_w)}}$; this can also be defined as the minimal interval with the property that ${S_w([0,1]\setminus F_w)=\emptyset}$. Note that ${F_w = \emptyset}$ whenever ${w\notin \mathop{\mathcal L}}$. Given ${w\in \mathop{\mathcal L}}$ and ${a\in A}$, observe that

$\displaystyle F_{wa} = \overline{T^{1+|w|}(J_{wa})} = \overline{T(T^{|w|}S_w S_a[0,1])} = \overline{T(T^{|w|}S_w I_a)} = \overline{T(F_w \cap I_a)}. \ \ \ \ \ (3)$

Now we can give a non-recursive description of ${F_w}$ that completes the proof of Proposition 1. If ${w\in A^*}$ satisfies (2), let

$\displaystyle k(w) = \max \{ k : w_{|w|-k+1} \cdots w_{|w|} = b_1 \cdots b_k \} \ \ \ \ \ (4)$

be the length of the longest prefix of ${{\mathbf b}}$ that appears as a suffix of ${w}$.

Lemma 2 If ${w\in A^*}$ satisfies (2), then ${F_w = [0,T^{k(w)}(1)]}$.

Proof: We go by induction in the length of ${w}$, using ${|w|=0}$ (so ${w}$ is the empty word and ${F_w = [0,1]}$) as the base case. Suppose we have some ${w\in A^n}$ satisfying (2), and that the lemma has been proved for all words with length ${. Write ${w = va}$ where ${|v| = n-1}$ and ${a\in A}$. Let ${j = k(v)}$, so that ${v=u b_1 \cdots b_j}$ for some word ${u}$. By the inductive hypothesis, we have ${F_v = [0,T^{j}(1)]}$ and hence ${v\in \mathop{\mathcal L}}$. Then by (3), we have

$\displaystyle F_w = F_{va} = \overline{T(F_v \cap I_a)} = \overline{T([0,T^{k(v)}(1)] \cap I_a)}.$

There are two cases to consider.

Case one: ${T^{k(v)}(1) \in I_a}$. In this case we have ${a=b_{j+1}}$ and ${k(w) = j+1}$, and ${F_w = \overline{T([\frac a\beta,T^{k(v)}(1)])} = [0,T^{k(v)+1}] = [0,T^{k(w)}(1)]}$.

Case two: ${T^{k(v)}(1) \notin I_a}$. In this case we must have ${a < b_{j+1}}$, otherwise we would have ${w=ub_1\cdots b_j a}$ and hence ${w_{|w|-j} \cdots w_{|w|} \succ b_1 \cdots b_{j+1}}$, violating (2). Thus ${I_a \subset F_v}$ and hence by (3), we have ${F_w = \overline{T(I_a)} = [0,1]}$. On the other hand, since ${a < b_{j+1}}$, we see that for every ${1\leq i\leq j}$ we have

$\displaystyle w_{|w|-j+i} \cdots w_{|w|}= b_i b_{i+1} \cdots b_j a \prec b_i \cdots b_j b_{j+1} \preceq b_1 \cdots b_{j-i+2},$

where the last inequality uses the fact that ${\sigma^{i-1}{\mathbf b} \preceq {\mathbf b}}$. We conclude that ${k(w) = 0}$, which completes the proof of Lemma 2. $\Box$

With Lemma 2 complete, Proposition 1 follows immediately since ${F_w\neq \emptyset}$ implies that ${w\in \mathop{\mathcal L}}$. $\Box$

A historical note: the transformation ${T_\beta}$ was introduced by Rényi in 1957 to study representations of real numbers in non-integer bases. The lexicographic characterization of ${\beta}$-shifts in Proposition 1 was given by Parry in 1960 (see Theorem 3 of that paper) using different methods than the proof here. This proof is essentially the same as the one given by Hofbauer in 1979 (see Theorem 2 there); Hofbauer’s approach has the advantage of dealing with all piecewise monotonic interval maps, and the argument given here for the ${\beta}$-shift is actually just a special case of his general result.

2.2. The ${\alpha}$${\beta}$ shifts

Instead of describing the full generality of Hofbauer’s result here, we describe how Proposition 1 adapts to the ${\alpha}$${\beta}$ shifts. This family of examples already reveals some of the challenges that arise when we go from ${\beta}$-shifts to more general piecewise expanding transformations.

Fix ${\alpha\in [0,1)}$ and ${\beta>1}$, then let ${T = T_{\alpha,\beta} \colon x \mapsto \alpha + \beta x \pmod 1}$, and let ${\Sigma = \Sigma_{\alpha,\beta}}$ be the coding space for ${T}$ with respect to its natural partition. Let ${{\mathbf a} = \inf \Sigma}$ and ${{\mathbf b} = \sup \Sigma}$ be the lexicographically minimal and maximal elements of ${\Sigma}$. Then every ${y\in \Sigma}$ has the property that

$\displaystyle {\mathbf a} \preceq \sigma^n y \preceq {\mathbf b} \text{ for all } n=0,1,2,\dots, \ \ \ \ \ (5)$

and every ${w\in \mathop{\mathcal L} = \mathop{\mathcal L}(\Sigma)}$ has the property that

$\displaystyle a_1 \cdots a_k \preceq w_{|w|-k+1} \cdots w_{|w|} \preceq b_1 \cdots b_k \text{ for all } 1\leq k \leq |w|. \ \ \ \ \ (6)$

As with the ${\beta}$-shifts, these conditions are both necessary and sufficient for membership in ${\Sigma}$ and ${\mathop{\mathcal L}}$. The key to proving this is the following analogue of Lemma 2, whose proof we omit since it is similar to the proof there (with just a little more bookkeeping).

Lemma 3 Given ${w\in A^*}$ satisfying (6), let

\displaystyle \begin{aligned} k_1(w) &= \max \{k : a_1 \cdots a_k = w_{|w|-k+1} \cdots w_{|w|} \}, \\ k_2(w) &= \max \{k : b_1 \cdots b_k = w_{|w|-k+1} \cdots w_{|w|} \}. \end{aligned} \ \ \ \ \ (7)

Then ${F_w = [\underline{T}^{k_1(w)}(0),\overline{T}^{k_2(w)}(1)]}$, where ${\underline{T}}$ and ${\overline{T}}$ agree with ${T_{\alpha,\beta}}$ on the interiors of the intervals ${I_a}$, and are defined at the endpoints by the condition that ${\underline{T}}$ is continuous from the right and ${\overline{T}}$ is continuous from the left.

3. A directed graph on ${{\mathbb N}}$

The lexicographic description of ${\Sigma_\beta}$ in Proposition 1 is appealing, but for many purposes it is more useful to have a description of the ${\beta}$-shift in terms of a countable-state directed graph, which goes back to work of Takahashi in 1973 and Hofbauer in 1978.

Start by considering the notion of follower set in a shift space, which is analogous to the follower intervals considered in the previous part. Given a shift space ${\Sigma}$ with language ${\mathop{\mathcal L}}$, consider for each ${w\in \mathop{\mathcal L}}$ the follower set

$\displaystyle F^w = \{y\in \Sigma : wy\in \Sigma \} = \sigma^{|w|}([w] \cap \Sigma). \ \ \ \ \ (8)$

One could also consider the set of words ${v\in \mathop{\mathcal L}}$ such that ${wv\in \mathop{\mathcal L}}$, but the definition in (8) will be more convenient for our purposes. It was shown by Weiss in 1973 that a shift can be represented via a labeled directed graph on finitely many vertices if and only if it has finitely many follower sets; that is, if ${\{F^w : w\in \mathop{\mathcal L}\}}$ is finite. Such shifts are called sofic. Similarly, ${\Sigma}$ can be represented via a labeled irreducible directed graph on countably many vertices if and only if it has countably many follower sets, and such shifts are called coded; see Fiebig and Fiebig 1992 and 2002.

Every ${\beta}$-shift is characterized by a sequence ${{\mathbf b}\in A^{\mathbb N}}$ with the property that ${\sigma^n{\mathbf b} \preceq{\mathbf b}}$ for all ${n\geq 0}$: as we saw in Proposition 1, we have

\displaystyle \begin{aligned} \Sigma_\beta &= \{y\in A^{\mathbb N} : \sigma^n y \preceq {\mathbf b} \text{ for all } n\geq 0 \}, \\ \mathop{\mathcal L}(\Sigma_\beta) &= \{w\in A^* : w_{|w|-k+1}\cdots w_{|w|} \preceq b_1 \cdots b_k \text{ for all } 1\leq k \leq |w| \}. \end{aligned}

Moreover, it follows from Lemma 2 that for every ${w\in \mathop{\mathcal L}(\Sigma_\beta)}$, the corresponding follower set in ${\Sigma_\beta}$ is

$\displaystyle F^w = \{y\in \Sigma : y\preceq \sigma^{k(w)}{\mathbf b}\} =: [\mathbf{0},\sigma^{k(w)}{\mathbf b}],$

where ${k(w)}$ is defined in (4) to be the length of the longest prefix of ${{\mathbf b}}$ that appears as a suffix or ${w}$. Thus ${F^w}$ is completely determined by ${k(w)}$, which can take countably many values, and since ${\beta}$-shifts are transitive this implies that they are coded. Thus they can be presented by a countable-state graph which supports a countable-state topological Markov chain (the Fischer cover).

Given any coded shift, the vertex set of the Fischer cover is the set of follower sets: ${V = \{F \subset \Sigma : F=F^w}$ for some ${w\in \mathop{\mathcal L}\}}$. A labeled edge is a triple ${(F,F',a)}$, where ${F,F'\in V}$ and ${a\in A}$; the labeled edge set of the Fischer cover is ${E = \{(F^w,F^{wa},a) : wa\in \mathop{\mathcal L}, a\in A\}}$; that is, there is an edge from ${F^w}$ to ${F^{wa}}$ whenever ${wa\in \mathop{\mathcal L}}$ and ${a\in A}$, and this edge is labeled with the symbol ${a}$. Note that there may be multiple edges between two vertices, each with a different label.

Say that a sequence ${y\in A^{\mathbb N}}$ labels a walk on the graph if there is a sequence of edges ${e_1,e_2,\dots}$ such that for every ${n}$,

1. ${e_n}$ is labeled with the symbol ${y_n}$, and
2. the target vertex of ${e_n}$ is the source vertex of ${e_{n+1}}$.

Then if ${y\in A^{\mathbb N}}$ labels a walk on this graph, there is a word ${w\in \mathop{\mathcal L}}$ such that ${w y_1 \cdots y_n \in \mathop{\mathcal L}}$ for all ${n}$, and hence ${y\in \Sigma}$. Conversely, every ${y\in \Sigma}$ labels a walk on the graph, so we can describe both ${\Sigma}$ and ${\mathop{\mathcal L}}$ in terms of walks on the graph.

For the ${\beta}$-shift, since the follower set ${F^w}$ is completely determined by ${k(w)}$, we can identify the vertex set of the Fischer cover with ${\{0,1,2,3,\dots\}}$, and observe that the two cases in Lemma 2 give the following two types of edges.

1. There is an edge from ${k}$ to ${k+1}$ whenever there are ${v\in \mathop{\mathcal L}}$ and ${a\in A}$ such that ${va\in \mathop{\mathcal L}}$, ${k(v)=k}$, and ${k(va) = k+1}$. This happens for every ${k}$ since we can take ${v=b_1 \cdots b_k}$ and ${a=b_{k+1}}$, so the edge from ${k}$ to ${k+1}$ is labeled with the symbol ${b_{k+1}}$.
2. There is an edge from ${k}$ to ${0}$ whenever there are ${v\in \mathop{\mathcal L}}$ and ${a\in A}$ such that ${va\in \mathop{\mathcal L}}$, ${k(v)=k}$, and ${k(va)=0}$. This happens whenever ${a < b_{k+1}}$, and so whenever ${b_{k+1} > 0}$ there are ${b_{k+1}}$ edges from ${k}$ to