Specification and the measure of maximal entropy

These are notes for a talk I am giving in Jon Chaika’s online working seminar in ergodic theory. The purpose of the talk is to outline Bowen’s proof of uniqueness of the measure of maximal entropy for shift spaces with the specification property. Bowen’s approach extends much more broadly than this: to non-symbolic systems (assuming expansivity); to equilibrium states for non-zero potential functions (assuming a bounded distortion property); and to non-uniformly hyperbolic systems using the notion of “obstructions to specification and expansivity” developed by Dan Thompson and myself (see some notes here and here, and videos here). In these notes, though, I want to give the bare bones of the argument in the simplest possible setting, to make the essential structure as clear as possible. I gave an alternate argument in a previous post; here I am giving Bowen’s original argument, although I do not necessarily follow his presentation. I should also point out that this argument differs from the construction of the MME, or more generally the equilibrium state, in Bowen’s monograph, which uses the Ruelle operator.

1. Setting and result

Let {A} be a finite set; the alphabet. Then the full shift {A^{\mathbb N}} is the set of infinite sequences of symbols from {A}; this is a compact metric space with {d(x,y) = e^{-n(x,y)}}, where {n(x,y) = \min \{ n : x_n \neq y_n\}}. The shift map {\sigma\colon A^{\mathbb N}\rightarrow A^{\mathbb N}} is defined by {\sigma(x)_n = x_{n+1}}. A shift space is a closed set {X\subset A^{\mathbb N}} with {\sigma(X) = X}.

Example 1 The best example to keep in mind through this talk is a topological Markov shift: fix {d\in{\mathbb N}} and put {A = \{1,\dots, d\}}; then fix a {d\times d} transition matrix {T} with entries in {\{0,1\}}, and write

\displaystyle  i\rightarrow j \text{ if } T_{ij} = 1, \quad i\not\rightarrow j \text{ if } T_{ij} = 0. \ \ \ \ \ (1)

Define {X = \{ x\in A^{\mathbb N} : x_n \rightarrow x_{n+1}} for all {n\}}. This is a topological Markov shift (TMS). It can be viewed in terms of a directed graph with vertex set {A} and edges given by (1): {X} consists of all sequences that label infinite paths on the graph.

The TMS is mixing or primitive if there is {N\in{\mathbb N}} such that {(T^N)_{ij} > 0} for all {i,j}. Equivalently, the graph is strongly connected and the set of loop lengths on the graph has gcd {1}.

Given a shift space {X}, consider

\displaystyle  \begin{aligned} \mathcal{M} &= \{ \text{Borel probability measures on }X\}, \\ \mathcal{M}_\sigma &= \{ \mu \in \mathcal{M} : \sigma_* \mu := \mu\circ \sigma^{-1} = \mu \}, \\ \mathcal{M}_\sigma^e &= \{ \mu \in \mathcal{M}_\sigma : \mu \text{ is ergodic} \}. \end{aligned}

The set {\mathcal{M}_\sigma} of invariant measures is extremely large for a mixing TMS (and more generally for systems with some sort of hyperbolic behavior), and it is important to identify “distinguished” invariant measures. One way of doing this is via the variational principle

\displaystyle  h_{\mathrm{top}}(X) = \sup \{ h(\mu) : \mu \in \mathcal{M}_\sigma \}.

The next section recalls the definitions of the topological and measure-theoretic entropies in this setting. A measure achieving the supremum is a measure of maximal entropy (MME).

We will see that every mixing TMS has a unique MME, via a more general result. Given {n\in{\mathbb N}_0} and {w\in A^n}, let {[w] = wX \cap X} be the set of sequences in {X} that start with the word {w} (juxtaposition denotes concatenation); call this the cylinder of {w}. Define the language of {X} by

\displaystyle  \mathcal{L}_n = \{ w\in A^n : [w] \neq \emptyset \}, \qquad \mathcal{L} = \bigcup_{n\in {\mathbb N}_0} \mathcal{L}_n.

Definition 1 {X} has specification if there is {\tau\in \mathbb{N}_0} such that for all {v,w\in \mathcal{L}} there is {u\in \mathcal{L}_\tau} such that {vuw\in \mathcal{L}}.

Exercise 1 Prove that every mixing TMS has specification.

Theorem 2 (Bowen) If {X} has specification, then it has a unique MME.

2. Entropies

2.1. Topological entropy

Every word in {\mathcal{L}_{m+n}} is of the form {vw} for some {v\in \mathcal{L}_m} and {w\in \mathcal{L}_n}; thus

\displaystyle  \mathcal{L}_{m+n} \subset \mathcal{L}_m \mathcal{L}_n \quad\Rightarrow\quad \#\mathcal{L}_{m+n} \leq \#\mathcal{L}_m \#\mathcal{L}_n.

This means that the sequence {c_n := \log \#\mathcal{L}_n} has the sub-additivity property

\displaystyle  c_{m+n} \leq c_m + c_n. \ \ \ \ \ (2)

Exercise 2 Prove Fekete’s lemma: for any sequence satisfying (2), {\lim_{n\rightarrow\infty} \frac 1n c_n} exists and is equal to {\inf_n \frac 1n c_n} (a priori it could be {-\infty}).

We conclude that the topological entropy {h_{\mathrm{top}}(X) := \lim_{n\rightarrow\infty} \frac 1n \log \#\mathcal{L}_n} exists for every shift space. This quantifies the growth rate of the total complexity of the system.

Exercise 3 Prove that {h_{\mathrm{top}}(X)} is the box dimension of {X} (easy). Then prove that it is also the Hausdorff dimension of {X} (harder). (Both of these facts rely very strongly on the fact that {d(\sigma x,\sigma y)/d(x,y)} is globally constant whenever {x,y} are close; for general systems where the amount of expansion may vary, the definition of topological entropy is more involved and the relationship to dimension is more subtle, although it is worth noting that a 1973 paper of Bowen in TAMS gives a definition of topological entropy that is analogous to Hausdorff dimension.)

2.2. Measure-theoretic entropy

Recall the motivation from information theory for the definition of entropy: given {p\in (0,1]}, let {I(p) = -\log p}. This can be interpreted as the information associated to an event with probability {p}; note that it is monotonic (the less likely an event is, the more information we gain by learning that it happened) and that {I(pq) = I(p) + I(q)}.

Now define {\phi} on {[0,1]} by {\phi(p) = pI(p) = -p\log p} (and {\phi(0) = 0}); this can be interpreted as the expected amount of information associated to an event with probability {p}.

Exercise 4 Show that {\phi} is concave.

Given {N\in{\mathbb N}}, let {\Delta = \Delta_N = \{ \bar{p} = (p_1,\dots, p_N) : p_i \geq 0 } and {\sum p_i \leq 1\}} be the set of sub-probability vectors with {N} components. Define

\displaystyle  H(\bar{p}) = \sum_{i=1}^N \phi(p_i); \ \ \ \ \ (3)

this can be interpreted as the expected information associated to a collection of mutually exclusive events with probabilities {p_1,\dots, p_N}.

Exercise 5 Show that {H(\bar{p}) \leq \log N} for all {\bar{p} \in \Delta_N}, with equality if and only if {p_i = \frac 1N} for all {i}.

Given {\mu\in \mathcal{M}_\sigma}, we have for each {n} a probability vector with {\#\mathcal{L}_n} components; writing {\mu(w) = \mu([w])} for convenience, the entropy (expected information) associated to this vector is

\displaystyle  H_n(\mu) := \sum_{w\in \mathcal{L}_n} \phi(\mu(w)). \ \ \ \ \ (4)

Lemma 3 {H_{m+n}(\mu) \leq H_m(\mu) + H_n(\mu)}


\displaystyle  \begin{aligned} H_{m+n}(\mu) &\leq \sum_{v\in \mathcal{L}_m} \sum_{w\in \mathcal{L}_n} \mu(vw) I \Big( \frac{\mu(vw)}{\mu(v)} \cdot \mu(v) \Big) \\ &= \sum_{w\in \mathcal{L}_n} \sum_{v\in \mathcal{L}_m} \mu(v) \phi \Big( \frac{\mu(vw)}{\mu(v)} \Big) + \sum_{v\in \mathcal{L}_m} \sum_{w\in \mathcal{L}_n} \mu(vw) I(\mu(v)) \\ &\leq \sum_{w\in \mathcal{L}_n} \phi \Big( \sum_{v\in \mathcal{L}_m} \mu(vw) \Big) + \sum_{v\in \mathcal{L}_m} \phi(\mu(v))\\ &= H_n(\mu) + H_m(\mu), \end{aligned}

where the first line is by definition, the second is since {I(pq) = I(p) + I(q)}, the third uses concavity of {\phi}, and the fourth uses invariance of {\mu} to get {\sum_{v\in \mathcal{L}_m} \mu(vw) = \mu(w)}. \Box

This lemma has the following intuitive interpretation: the expected information from the first {m+n} symbols is at most the expected information from the first {m} symbols plus the expected information from the next {n} symbols.

Now Fekete’s lemma implies that the following measure-theoretic entropy exists:

\displaystyle  h(\mu) := \lim_{n\rightarrow\infty} \frac 1n H_n(\mu). \ \ \ \ \ (5)

2.3. Variational principle

Using Exercise 5 we see that {H_n(\mu) \leq c_n = \log \#\mathop{\mathcal L}_n}, with equality if and only if {\mu(w) = 1/\#\mathop{\mathcal L}_n} for all {w\in \mathop{\mathcal L}_n}. This immediately proves that

\displaystyle  h(\mu) \leq h_{\mathrm{top}}(X) \text{ for all } \mu \in \mathcal{M}_\sigma, \ \ \ \ \ (6)

and suggests that in order to have equality we should look for a measure with

\displaystyle  \mu(w) \approx \frac 1{\#\mathop{\mathcal L}_n} \approx e^{-n h_{\mathrm{top}}(X)} \text{ for all } w\in \mathcal{L}_n. \ \ \ \ \ (7)

The following makes this more precise.

Definition 4 {\mu\in \mathcal{M}_\sigma} is a Gibbs measure if there are {h,c,C>0} such that for all {n\in {\mathbb N}} and {w\in \mathcal{L}_n}, we have {ce^{-nh} \leq \mu(w) \leq C e^{-nh}}.

Now Theorem 2 is a consequence of the following two results, which we prove below.

Theorem 5 If {\mu} is an ergodic Gibbs measure for {X}, then {h=h_{\mathrm{top}}(X) = h(\mu)} and {\mu} is the unique MME.

Theorem 6 If {X} has specification, then it has an ergodic Gibbs measure.

In fact the construction of {\mu} below always gives equality in (6), without relying on the specification property (or obtaining uniqueness), but we will not prove this.

2.4. Convex combinations

Before embarking on the proof of Theorems 5 and 6, we establish a general property of entropy that will be important.

Lemma 7 Given {\bar{p},\bar{q} \in \Delta_N} and {s,t \in [0,1]} with {s+t = 1}, we have

\displaystyle  sH(\bar{p}) + tH(\bar{q}) \leq H(s\bar{p} + t\bar{q}) \leq sH(\bar{p}) + tH(\bar{q}) + \log 2. \ \ \ \ \ (8)

Proof: The first inequality follows immediately from concavity of {\phi}. For the second inequality we first observe that

\displaystyle  H(s\bar{p}) = \sum_{i=1}^N sp_i I(sp_i) = \sum_{i=1}^N \big(sp_i I(p_i) + sp_i I(s)\big) = sH(\bar{p}) + \phi(s) \sum_{i=1}^N p_i. \ \ \ \ \ (9)

Then we use monotonicity of {I} to get

\displaystyle  \begin{aligned} H(s\bar{p} + t\bar{q}) &= \sum s p_i I(sp_i + tq_i) + tq_i I(sp_i + tq_i) \\ &\leq \sum s p_i I(sp_i) + \sum tq_i I(tq_i) \\ & = sH(\bar{p}) + tH(\bar{q}) + \phi(s) \sum p_i + \phi(t) \sum q_i, \end{aligned}

where the last equality uses (9) twice; this proves (8). \Box

Applying Lemma 7 to the probability vectors associated to two measures {\nu,\mu \in \mathcal{M}_\sigma}, we see that

\displaystyle  s H_n(\nu) + t H_n(\mu) \leq H_n(s\nu + t\mu) \leq s H_n(\nu) + t H_n(\mu) + \log 2

for all {n}: dividing by {n} and sending {n\rightarrow\infty} gives

\displaystyle  h(s\nu + t\mu) = s h(\nu) + t h(\mu). \ \ \ \ \ (10)

Remark 1 It is worth mentioning the deeper fact that a version of (10) holds for infinite convex combinations (even uncountable ones given by an integral); this is due to Konrad Jacobs, see Section 9.6 of “Foundations of Ergodic Theory” by Viana and Oliveira.

3. Gibbs implies uniqueness

To prove Theorem 5, start by observing that the lower Gibbs bound gives

\displaystyle  1 = \mu(X) = \sum_{w\in \mathcal{L}_n} \mu(w) \geq ce^{-nh} \#\mathcal{L}_n \quad\Rightarrow\quad \#\mathcal{L}_n \leq c^{-1} e^{nh}, \ \ \ \ \ (11)

and thus {h_{\mathrm{top}}(X) \leq h}. Meanwhile, the upper Gibbs bound gives

\displaystyle  H_n(\mu) = \sum_{w\in \mathcal{L}_n} \mu(w) I(\mu(w)) \geq \sum_{w\in \mathcal{L}_n} \mu(w) I(C e^{-nh}) = I(C e^{-nh}) = nh - \log C,

and thus {h(\mu) \geq h}, so we conclude that {h_{\mathrm{top}}(X) = h = h(\mu)}. It remains to show that every {\nu \in \mathcal{M}_\sigma} with {\nu \neq \mu} has {h(\nu) < h}.

First observe that by (10), we can restrict our attention to the case when {\nu\perp \mu}. Indeed, given any {\nu \in \mathcal{M}_\sigma}, the Lebesgue decomposition theorem gives {\nu = s\nu_1 + t\nu_2} for some {\nu_1 \perp \mu} and {\nu_2 \ll \mu}, and (10) gives {h(\nu) = sh(\nu_1) + th(\nu_2)}. By ergodicity we must have {\nu_2 = \mu} and thus {s>0}, so if {h(\nu_1) < h} then the same is true of {h(\nu)}.

Now consider {\nu \perp \mu}. Then there is a Borel set {D\subset X} with {\nu(D) = 0} and {\mu(D) = 1}, and this in turn gives {\mathcal{D}_n \subset \mathcal{L}_n} such that

\displaystyle  \nu(\mathcal{D}_n) \rightarrow 0 \text{ and } \mu(\mathcal{D}_n) \rightarrow 1 \text{ as } n\rightarrow\infty. \ \ \ \ \ (12)

Let {\nu|_{\mathcal{D}_n}} denote the normalization of {\nu} after restricting to words in {\mathcal{D}_n}, and similarly for {\nu|_{\mathcal{D}_n^c}}. Recall from Fekete’s lemma and subadditivity of {H_n(\nu)} that {\frac 1n H_n(\nu) \geq h(\nu)} for all {n}. Then we get

\displaystyle  \begin{aligned} nh(\nu) &\leq H_n(\nu) = H_n \big(\nu(\mathcal{D}_n) \nu|_{\mathcal{D}_n} + \nu(\mathcal{D}_n^c) \nu|_{\mathcal{D}_n^c} \big) \\ &\leq \nu(\mathcal{D}_n) H_n(\nu|_{\mathcal{D}_n}) + \nu(\mathcal{D}_n^c) H_n(\nu|_{\mathcal{D}_n^c}) + \log 2 \\ &\leq \nu(\mathcal{D}_n) \log \#\mathcal{D}_n + \nu(\mathcal{D}_n^c) \log \#\mathcal{D}_n^c + \log 2 \\ &\leq \nu(\mathcal{D}_n) \log(c^{-1} e^{nh} \mu(\mathcal{D}_n)) + \nu(\mathcal{D}_n^c) \log(c^{-1} e^{nh} \mathcal{D}_n^c) + \log 2 \\ &= nh - \log c + \log 2 + \nu(\mathcal{D}_n) \log \mu(\mathcal{D}_n) + \nu(\mathcal{D}_n^c) \log \mu(\mathcal{D}_n^c), \end{aligned}

where the second line uses Lemma 7, the third line uses Exercise 5, and the fourth line uses the lower Gibbs bound as in (11). We conclude that

\displaystyle  n(h(\nu) - h) \leq \log (2/c) + \nu(\mathcal{D}_n) \log \mu(\mathcal{D}_n) + \nu(\mathcal{D}_n^c) \log \mu(\mathcal{D}_n^c),

and as {n\rightarrow\infty} the right-hand side goes to {-\infty} by (12), which implies that {h(\nu) < h}, completing the proof.

4. Specification implies Gibbs

Now we outline the proof of Theorem 6. This comes in three steps: (1) uniform counting bounds; (2) construction of a Gibbs measure; (3) proof of ergodicity.

4.1. Uniform counting bounds

From now on we write {h = h_{\mathrm{top}}(X)} for convenience. Fekete’s lemma gives {\frac 1n \log \#\mathcal{L}_n \geq h} for all {n}, or equivalently {\#\mathcal{L}_n \geq e^{nh}}. This can also be deduced by writing

\displaystyle  \#\mathcal{L}_{kn} \leq (\#\mathcal{L}_n)^k \quad\Rightarrow\quad \frac 1{kn} \log \#\mathcal{L}_{kn} \leq \frac 1n \log\# \mathcal{L}_n

and sending {k\rightarrow\infty} so that the left-hand side goes to {h} (this is basically part of the proof of Fekete’s lemma).

To get an upper bound on {\#\mathcal{L}_n} we need the specification property, which gives a 1-1 map {\mathcal{L}_m\times \mathcal{L}_n \rightarrow \mathcal{L}_{m+n+\tau}}, so that {\#\mathcal{L}_{m+n+\tau} \geq \#\mathcal{L}_m \#\mathcal{L}_n}. Then one can either apply Fekete’s lemma to {b_n := -\log \# \mathcal{L}_{n+\tau}}, or observe that

\displaystyle  \#\mathcal{L}_{k(n+\tau)} \geq (\#\mathcal{L}_n)^k \quad\Rightarrow\quad \frac 1{k(n+\tau)} \log \#\mathcal{L}_{k(n+\tau)} \geq \frac 1{n+\tau} \log\# \mathcal{L}_n.

Sending {k\rightarrow\infty} the left-hand side goes to {h}, and so combined with the lower bound above we get the uniform counting bounds

\displaystyle  e^{nh} \leq \#\mathcal{L}_n \leq e^{\tau h} e^{nh}. \ \ \ \ \ (13)

4.2. A Gibbs measure

There is a standard procedure for constructing an MME: let {\nu_n \in \mathcal{M}} be any sequence of (not necessarily invariant) Borel probability measures such that {\nu_n(w) = 1/\#\mathcal{L}_n} for all {w\in \mathcal{L}_n}, and then put

\displaystyle  \mu_n = \frac 1n \sum_{k=0}^{n-1} \sigma_*^k \nu_n = \frac 1n \sum_{k=0}^{n-1} \nu_n \circ \sigma^{-k}.

Since {\mathcal{M}} is weak* compact, there is a weak* convergent subsequence {\mu_{n_j}}.

Exercise 6 Show that {\mu := \lim_{j\rightarrow\infty} \mu_{n_j}} is {\sigma}-invariant ({\mu\circ \sigma^{-1} = \mu}).

The preceding exercise is basically the proof of the Krylov-Bogolyubov theorem, and does not require any properties of the measures {\nu_n} beyond the fact that they are Borel probability measures.

One can prove that {\mu} is an MME whether or not {X} has specification, but we will use specification to directly prove the stronger Gibbs property.

Given any {w\in \mathcal{L}_m} and any {\tau \leq k \leq n-\tau}, we bound {\nu_n(\sigma^{-k}[w])} by estimating how many words in {\#\mathcal{L}_n} are of the form {uwv} for some {u \in \mathcal{L}_k} and {v\in \mathcal{L}_{n-m-k}}. Arguments similar to those in the uniform counting bounds show that

\displaystyle  \#\mathcal{L}_{k-\tau} \#\mathcal{L}_{n-k-m-\tau} \leq (\text{\# of such words}) \leq \#\mathcal{L}_k \#\mathcal{L}_{n-k-m},

where the first inequality requires specification. Dividing by {\#\mathcal{L}_n} and using the uniform counting bounds (13) gives

\displaystyle  \nu_n(\sigma^{-k}[w]) \leq \frac{\#\mathcal{L}_k \#\mathcal{L}_{n-k-m}}{\#\mathcal{L}_n} \leq \frac{e^{\tau h} e^{k h} e^{\tau h} e^{(n-k-m)h}}{e^{nh}} = e^{2\tau h} e^{-mh};

using a similar estimate from below we get

\displaystyle  e^{-3\tau h} e^{-mh} \leq \nu_n(\sigma^{-k}[w]) \leq e^{2\tau h} e^{-mh}. \ \ \ \ \ (14)

Averaging over all {k} and sending {n\rightarrow\infty} along the subsequence {n_j} gives

\displaystyle  e^{-3\tau h} e^{-mh} \leq \mu(w) \leq e^{2\tau h} e^{-mh}, \ \ \ \ \ (15)

so {\mu} is a Gibbs measure.

4.3. Ergodicity

To prove that {\mu} is ergodic, start by fixing {v,w\in \mathcal{L}}, with lengths {|v|} and {|w|}, respectively. Given {j\gg |v|} and {k\ll n}, follow the same procedure as above to estimate the number of words in {\mathcal{L}_n} with the form {xvywz}, where {x\in \mathcal{L}_k}, {y\in \mathcal{L}_{j - |v|}}, and {z\in \mathcal{L}_{n - k - j - |w|}}, and obtain the bounds

\displaystyle  \frac{\#\mathcal{L}_{k-\tau} \#\mathcal{L}_{j-|v|-\tau} \#\mathcal{L}_{n - k - j - |w| - \tau}}{\#\mathcal{L}_n} \leq \sigma_*^k \nu_n([v] \cap \sigma^{-j} [w]) \leq \frac{\#\mathcal{L}_k \#\mathcal{L}_{j-|v|} \#\mathcal{L}_{n - k - j - |w|}}{ \#\mathcal{L}_n}.

Averaging over {k}, sending {n\rightarrow\infty}, and using the uniform counting bounds (13) gives

\displaystyle  e^{-4\tau h} e^{-|v|h} e^{-|w|h} \leq \mu([v] \cap \sigma^{-j}[w]) \leq e^{3\tau h} e^{-|v|h} e^{-|w|h}.

Using the Gibbs bounds (15) gives

\displaystyle  e^{-8\tau h} \mu(v) \mu(w) \leq \mu([v] \cap \sigma^{-j}[w]) \leq e^{9\tau h} \mu(v) \mu(w). \ \ \ \ \ (16)

Exercise 7 Given Borel sets {V,W \subset X}, approximate {V} and {W} with cylinders and use (16) to get

\displaystyle  e^{-8\tau h} \mu(V) \mu(W) \leq \varliminf_{j\rightarrow\infty} \mu(V \cap \sigma^{-j}W) \leq \varlimsup_{j\rightarrow\infty} \mu(V \cap \sigma^{-j}W) \leq e^{9\tau h} \mu(V) \mu(W).

Now if {E \subset X} is invariant, then taking {V=E} and {W = E^c} in Exercise 7, the lower bound gives {e^{-8\tau h} \mu(E)(1-\mu(E)) =0}, so {\mu(E)} is either {0} or {1}. This proves ergodicity and completes the proof of Theorem 6.

Remark 2 In fact, the upper bound in Exercise 7 can be used to show that {\mu} is mixing; see Proposition 20.3.6 in Katok and Hasselblatt.

Posted in ergodic theory, theorems | Leave a comment

Cohomologous functions and the Livsic theorem

Given a dynamical system {f\colon X\rightarrow X} (typically {X} is a compact metric space and {f} is continuous), we commonly encounter real-valued functions {\varphi\colon X\rightarrow {\mathbb R}} in one of the following two roles.

  1. An observable function represents a measurement of the system, so that the sequence of functions {\{\varphi\circ f^n\}} represents measurements made at different times, about which we want to make predictions. In the cases we are interested in, these predictions will be probabilistic and will be in terms of the Birkhoff averages {\frac 1n S_n\varphi(x)}, where {S_n\varphi(x) = \sum_{k=0}^{n-1} \varphi(f^k x)} is the {n}th Birkhoff sum.
  2. A potential function is used to assign weights to different trajectories of the system for purposes of selecting an invariant measure with specific dynamical or geometric properties. Generally the orbit segment {x, f(x), \dots, f^{n-1}(x)} is assigned a weight given by {e^{S_n\varphi(x)}}. A potential function also assigns weights to invariant measures by integration; in particular, we can study equilibrium measures for {(X,f,\varphi)}, which are invariant probability measures maximizing {h_\mu(f) + \int\varphi\,d\mu}. (Here {h_\mu} is Kolmogorov–Sinai entropy.)

Before moving on we remark that {\varphi\colon X\rightarrow[0,\infty)} can also play the role of a density function, especially if we are looking for an invariant measure that is absolutely continuous with respect to some reference measure, but for today we will focus on the two roles described above.

Let {\mathcal{M}_f(X)} denote the set of {f}-invariant Borel probability measures on {X}. Then each {\varphi \in C(X)} induces a map {\hat\varphi\colon \mathcal{M}_f(X) \rightarrow {\mathbb R}} by {\hat\varphi(\mu) = \int\varphi\,d\mu} as in the second item above. It is natural to ask when two functions {\varphi,\psi \in C(X)} have {\hat\varphi=\hat\psi}. One immediate observation to make is that the defintion of {f}-invariance implies that {\int\varphi \,d\mu = \int\varphi\circ f \,d\mu} for all {\varphi\in C(X)} and {\mu\in \mathcal{M}_f(X)}, so that in particular we have {\hat\varphi = \widehat{\varphi\circ f}}. Thus the function {\psi = \varphi - \varphi\circ f} has {\hat\psi = \hat \varphi - \widehat{\varphi\circ f} = 0}. We call such a function {\psi} a coboundary; we have just shown that

Proposition 1 if {\psi} is a coboundary, then it integrates to {0} with respect to any invariant measure.

Moreover, since integration is linear in {\varphi}, we have

Proposition 2 if two functions {\varphi} and {\psi} differ by a coboundary, then {\hat\varphi=\hat\psi}, ie., {\int\varphi\,d\mu = \int\psi\,d\mu} for every {\mu\in \mathcal{M}_f(X)}.

If {\varphi} and {\psi} differ by a coboundary, we say that they are cohomologous; in this case we can write {\varphi = \psi + h - h\circ f} for some {h\in C(X)}, which we call the transfer function. Note that {\varphi} is a coboundary if and only if it is cohomologous to the zero function.

Remark 1 For a discussion of how this is connected to the notion of cohomology in algebraic topology, see Terry Tao’s blog post from December 2008.

It is natural to ask whether cohomology is the only mechanism by which two functions {\varphi,\psi\in C(X)} can have {\hat\varphi=\hat\psi}; in other words, do the converses of Propositions 1 and 2 hold?

In general the answer is no, as one can quickly see by letting {f} be an irrational circle rotation, so that the only invariant measure is Lebesgue, and there are many continuous functions on the circle that have Lebesgue integral {0} but are not coboundaries. When {f} has hyperbolic behavior, however, the story is quite different, and this is what we will discuss in this post.

If {f} is a diffeomorphism and {X} is a topologically transitive locally maximal hyperbolic set, then {f\colon X\rightarrow X} satisfies the following result.

Theorem 3 (Closing Lemma) For every {\epsilon>0} there exists {\delta>0} such that if {x\in X} and {n\geq 0} are such that {d(f^n(x),x) < \delta}, then there exists {y\in X} such that {f^n(y) = y} and {d(f^k(y), f^k(x)) < \epsilon} for all {0\leq k < n}.

Moreover, in this case every Hölder continuous {\varphi\colon X\rightarrow {\mathbb R}} has the following property, introduced by Walters in 1978.

Theorem 4 (Walters Property) For every {\zeta>0} there exist {\epsilon>0} such that if {x,y\in X} and {n\geq 0} are such that {d(f^k(x),f^k(y)) < \epsilon} for all {0\leq k < n}, then {|S_n\varphi(x) - S_n\varphi(y)| < \zeta}.

Both the Closing Lemma and the Walters Property also hold when {X} is a subshift of finite type and {\varphi} is Hölder continuous.

Using these properties we can formulate and prove an important result.

Theorem 5 (Livsic Theorem) Let {X} be a compact metric space, {f\colon X\rightarrow X} a continuous map satisfying the Closing Lemma and possessing a point whose orbit is dense, and {\varphi\colon X\rightarrow{\mathbb R}} a continuous function satisfying the Walters Property. Then {\varphi} is a coboundary if and only if for every periodic point {x = f^p(x) \in X}, we have {S_p \varphi(x) = 0}.

The forward implication is immediate: if {\varphi} is a coboundary, then {\int\varphi\,d\mu = 0} for every invariant {\mu}, in particular for {\mu = \frac 1p \sum_{k=0}^{p-1} \delta_{f^k x}}. Equivalently one can observe that Birkhoff sums of coboundaries have the following behavior: if {\varphi(x) = h(x) - h(f x)}, then

\displaystyle  S_n\varphi(x) = \sum_{k=0}^{n-1} \big(h(f^k x) - h(f^{k+1} x)\big) = h(x) - h(f^n x), \ \ \ \ \ (1)

and consequently {S_p \varphi(x) = h(x) - h(f^p x) = 0} whenever {x=f^p x}. We must prove the reverse implication, that if {S_p\varphi(x)=0} whenever {x=f^p x}, then {\varphi} is a coboundary. To this end note that if {h} is a continuous (hence uniformly continuous) transfer function for {\varphi}, then (1) immediately determines {h} along the entire forward orbit of a point {y} by

\displaystyle  h(f^n y) = h(y) + S_n\varphi(y). \ \ \ \ \ (2)

(A similar procedure defines {h} along the backward orbit if {f} is invertible.) If {y} is a point whose orbit is dense in {X}, then this determines {h} on all of {X} by continuity. Thus it suffices to prove that (2) defines a uniformly continuous function on the orbit of {y}. To this end, given {\zeta>0}, let {\epsilon = \epsilon(\zeta)>0} be given by the Walters Property, and then let {\delta = \delta(\epsilon)>0} be given by the Closing Lemma. Suppose that {w,z} are two points on the orbit of {y} such that {d(w,z) < \delta}. Since {w,z} lie on the same orbit, there is {n\in {\mathbb N}} such that {f^n(z)=w} or {f^n(w)=z}. Without loss of generality we assume the first case, so that {d(z,f^n z) = d(z,w) < \delta}. By the Closing Lemma there is a periodic point {y=f^n y} such that {d(f^k y, f^k z) < \epsilon} for all {0\leq k < n}. By (2) and the Walters Property, this implies that

\displaystyle  |h(w) - h(z)| = |h(f^n z) - h(z)| = |S_n\varphi(z)| \leq |S_n\varphi(x)| + \zeta = \zeta,

where the last equality uses the fact that Birkhoff sums of {\varphi} vanish around periodic orbits. This proves uniform continuity of {h} on the orbit of {y}, so {h} extends uniformly continuously to {X}, and it is an easy exercise using continuity of both sides of the equation {\varphi = h - h\circ f} to verify that this relationship continues to hold on all of {X}, so that {\varphi} is indeed a coboundary. This completes the proof of the Livsic Theorem.

Let {X,f} be as in the Livsic Theorem; then for any {\varphi,\psi} satisfying the Walters Property, the following are equivalent:

  1. {\varphi = \psi + h - h\circ f} for some {h\in C(X)};
  2. {\int\varphi\,d\mu = \int\psi\,d\mu} for all {\mu\in\mathcal{M}_f(X)};
  3. {\int\varphi\,d\mu = \int\psi\,d\mu} for all periodic orbit measures.

Here is one final remark on a specific example of cohomologous potentials: one immediately sees that {\int \frac 1nS_n\varphi \,d\mu = \int \varphi\,d\mu} for every {\mu\in\mathcal{M}_f(X)}, and so under the conditions of the Livsic Theorem one can conclude that each Birkhoff average {\frac 1nS_n\varphi} is cohomologous to {\varphi}. In fact this can be shown directly, without using the Livsic Theorem, by putting

\displaystyle  \begin{aligned} h(x) &= \frac 1n\big((n-1) \varphi(x) + (n-2)\varphi(f x) + (n-3)\varphi(f^2 x) + \cdots + \varphi(f^{n-2} x)\big) \\ &= \frac 1n\sum_{k=0}^{n-1} (n-1-k) \varphi(f^{k-1} x). \end{aligned}

Then we have

\displaystyle  \begin{aligned} h(x) - h(fx) &= \frac 1n\big((n-1) \varphi(x) + (n-2)\varphi(f x) + \cdots + \varphi(f^{n-2} x)\big) \\ &\quad\quad - \frac 1n\big((n-1) \varphi(f x) + (n-2)\varphi(f^2 x) + \cdots + \varphi(f^{n-1} x)\big) \\ &= \frac{n-1}n \varphi(x) - \frac 1n\big( \varphi(fx) + \varphi(f^2x) + \cdots + \varphi(f^{n-1} x) \big) \\ &= \varphi(x) - \frac 1n S_n\varphi(x), \end{aligned}

which demonstrates that {\varphi} and {\frac 1nS_n\varphi} are cohomologous.

Posted in Uncategorized | Leave a comment

Gibbs measures have local product structure

Let {M} be a compact smooth manifold and {f\colon M\rightarrow M} a transitive Anosov {C^{1+\alpha}} diffeomorphism. If {\mu} is an {f}-invariant Borel probability measure on {M} that is absolutely continuous with respect to volume, then the Hopf argument can be used to show that {\mu} is ergodic. In fact, recently it has been shown that even stronger ergodic properties such as multiple mixing can be deduced; for example, see Coudène, Hasselblatt, and Troubetzkoy [Stoch. Dyn. 16 (2016), no. 2].

A more general class of measures with strong ergodic properties is given by the theory of thermodynamic formalism developed in the 1970s: given any Hölder continuous potential {\varphi\colon M\rightarrow{\mathbb R}}, the quantity {h_\mu(f) + \int\varphi\,d\mu} is maximized by a unique invariant Borel probability measure {\mu = \mu_\varphi}, which is called the equilibrium state of {\varphi}; the maximum value of the given quantity is the topological pressure {P = P(\varphi)}. The unique equilibrium state has the Gibbs property: for every {\varepsilon>0} there is a constant {K>0} such that

\displaystyle  K^{-1} \leq \frac{\mu(B_n(x,\varepsilon))}{e^{S_n\varphi(x) - nP}} \leq K, \ \ \ \ \ (1)

where {B_n(x,\varepsilon) = \{y\in M : d(f^kx,f^ky) < \varepsilon\ \forall 0\leq k < n\}} is the Bowen ball around {x} of order {n} and radius {\varepsilon}, and we write {S_n\varphi(x) = \sum_{k=0}^{n-1} \varphi(f^kx)} for the {n}th ergodic sum along the orbit of {x}.

Historically, strong ergodic properties (mixing, K, Bernoulli) for equilibrium states have been established using methods such as Markov partitions rather than via the Hopf argument. However, in the more general non-uniformly hyperbolic setting, it can be difficult to extend these symbolic arguments, and so it is interesting to ask whether the Hopf argument can be applied instead, even if it only recovers some of the strong ergodic properties. The key property of absolutely continuous measures that is needed for the Hopf argument is the fact that they have a local product structure, which we define below. It was shown by Haydn [Random Comput. Dynam. 2 (1994), no. 1, 79–96] and by Leplaideur [Trans. Amer. Math. Soc. 352 (2000), no. 4, 1889–1912] that in the uniformly hyperbolic setting, the equilibrium states {\mu_\varphi} have local product structure when {\varphi} is Hölder continuous; thus one could apply the Hopf argument to them.

This post contains a direct proof that any measure {\mu} with the Gibbs property (1) has local product structure; see Theorem 3 below. (This will be a bit of a longer post, since we need to recall several different concepts and then do some non-trivial technical work.) Since Bowen’s proof of uniqueness of equilibrium states using specification [Math. Systems Theory 8 (1974/1975), no. 3, 193–202] establishes the Gibbs property, this means that the equilibrium states produced this way could be addressed with the Hopf argument (I haven’t carried out the details yet, so I claim no formal results here). I should point out, though, that even without the use of Markov partitions, Ledrappier showed that these measures have the K property, which in particular implies multiple mixing. Since multiple mixing is the strongest thing we might hope to get from the Hopf argument, my primary motivation for the present approach is that Dan Thompson and I recently generalized Bowen’s result to systems satisfying a certain non-uniform specification property [Adv. Math. 303 (2016), 745–799], and the unique equilibrium states we obtain satisfy a non-uniform version of the Gibbs property (1), so it is reasonable to hope that they also have local product structure and can be studied using the Hopf argument; but this is beyond the scope of this post and will be addressed in a later paper.

1. Local product structure

Before defining local product structure for {\mu}, we recall some definitions. Since {f} is Anosov, every {x\in M} has local stable and unstable manifolds {W^{s,u}_x}, which have the following properties.

  1. There are {C \geq 1} and {\lambda < 1 } such that for all {x\in M}, {y\in W^s_x}, and {n\geq 0}, we have {d(f^n x, f^n y) \leq C\lambda^n d(x,y)}; a similar contraction bound holds going backwards in time when {y\in W^u_x}.
  2. There is {\delta>0} such that if {d(f^n x, f^ny) \leq \delta} for all {n\geq 0}, then {y\in W^s_x}; similarly for {W^u_x} with {n\leq 0}.
  3. There is {\varepsilon>0} such that if {d(x,y) \leq \varepsilon}, then {W_x^s \cap W_y^u} is a single point, which we denote {[x,y]}. Moreover, there is a constant {Q} such that {d([x,y],x) \leq Q d(x,y)}, and similarly for {d([x,y],y)}.

A set {R\subset M} is a rectangle if it has diameter {\leq\varepsilon} and is closed under the bracket operation: in other words, for every {x,y\in R}, the intersection point {[x,y] = W_x^s \cap W_y^u} exists and is contained in {R}.

Lemma 1 For every {x\in M}, there is a rectangle {R} containing {x}.

Proof: Write {V_x^s = \{y\in W_x^s : d(x,y) \leq \frac{\varepsilon}{2(Q+1)}\}} and similarly for {V_x^u}. Consider the set {R = \{[y,z] : y\in V_x^u, z\in V_x^s\}} and observe that for every {[y,z]\in R} we have

\displaystyle  d([y,z],x) \leq d([y,z],y) + d(y,x) \leq (Q+1)d(y,x) \leq \tfrac\varepsilon2.

Thus {R} has diameter {\leq \varepsilon}, and for every {[y,z], [y',z']\in R} we have

\displaystyle  [[y,z],[y',z']] = W_{[y,z]}^s \cap W_{[y',z']}^u = W^s_y \cap W^u_{z'} = [y,z'] \in R,

so {R} is indeed a rectangle. \Box

Given {x\in M} and a rectangle {R\ni x}, let {V^{s,u}_x = W^{s,u}_x \cap R}. Then {R} can be recovered from {V^{s,u}_x} via the method in the proof above, as the image of the map {[\cdot,\cdot] \colon V_x^u \times V_x^s \rightarrow M}. Given measures {\nu^{s,u}} on {V^{s,u}_x}, let {\nu^u\otimes\nu^s} be the pushforward of {\nu^s\times \nu^u} under this map; that is, for every pair of Borel sets {A\subset V_x^u} and {B\subset V_x^s}, we put {(\nu^u\otimes \nu^s)(\{[y,z] : y\in A, z\in B\}) = \nu^u(A)\nu^s(B)}.

Definition 2 A measure {\mu} has local product structure with respect to {W^{s,u}} if for every {x\in M} and rectangle {R\ni x} there are measures {\nu^{s,u}} on {V^{s,u}_x} such that {\mu|_R \ll \nu^u \otimes \nu^s}.

Theorem 3 Let {f\colon M\rightarrow M} be a {C^1} Anosov diffeomorphism, {\varphi\colon M\rightarrow {\mathbb R}} a Hölder continuous function, and {\mu} an {f}-invariant Borel probability measure on {M} satisfying the Gibbs property (1) for some {P\in {\mathbb R}}. Then {\mu} has local product structure in the sense of Definition 2. Moreover, there is {\bar K \geq 1} such that for all {R}, the measures {\nu^{s,u}} can be chosen so that the Radon–Nikodym derivative {\psi = \frac{d\mu}{d(\nu^u\otimes \nu^s)}} satisfies {\bar K^{-1} \leq \psi \leq \bar K} at {\mu}-a.e. point.

A quick side remark: here the diffeomorphism is only required to be {C^1}. The reason for the {C^{1+\alpha}} hypothesis at the beginning of this post was so that the geometric potential {\varphi(x) = -\log |\det Df|_{E^u_x}|} is Hölder continuous and its unique equilibrium state (the absolutely continuous invariant measure if it exists, or more generally the SRB measure) has the Gibbs property; this may not be true if {f} is only {C^1}.

2. Conditional measures

In order to prove Theorem 3, we must start by recalling the notion of conditional measures; see Coudène’s book (especially Chapters 14 and 15) or Viana’s notes for more details than what is provided here.

Let {(X,\mathop{\mathcal A},\mu)} be a Lebesgue space. A partition of {X} is a map {\xi \colon X\rightarrow \mathop{\mathcal A}} such that for {\mu}-a.e. {x,y\in X}, the sets {\xi(x)} and {\xi(y)} either coincide or are disjoint. Write {\Xi = \{\xi(x) : x\in X\}} for the set of partition elements, and say that the partition {\xi} is finite if {\Xi} is finite.

Given a finite partition {\xi}, it is easy to define conditional measures {\mu_{\xi(x)}} on the set {\xi(x)} for {\mu}-a.e. {x} by writing

\displaystyle  \mu_{\xi(x)}(A) = \frac{\mu(A \cap \xi(x))}{\mu(\xi(x))} \ \ \ \ \ (2)

when {\mu(\xi(x))>0}, and ignoring those partition elements with zero measure. One can recover the measure {\mu} from its conditional measures by the formula

\displaystyle  \mu(A) = \sum_{C\in \Xi} \mu_C(A) \mu(C). \ \ \ \ \ (3)

If we write {\hat\mu} for the measure on {\Xi} defined by putting {\hat\mu(\{C\}) = \mu(C)} for all {C\in \Xi}, then (3) can be written as

\displaystyle  \mu(A) = \int_\Xi \mu_C(A) \,d\hat\mu(C). \ \ \ \ \ (4)

Even when the partition {\xi} is infinite, one may still hope to obtain a formula along the lines of (4).

Example 1 Let {X} be the unit square, {\mu} be two-dimensional Lebesgue measure, and {\xi(x)} be the horizontal line through {x}. Then Fubini’s theorem gives (4) by taking {\mu_C} to be Lebesgue measure on horizontal lines, and defining {\hat\mu} on {\Xi}, the set of horizontal lines in {[0,1]^2}, in one of the two following (equivalent) ways:

  1. given {E \subset \Xi}, let {\hat\mu(E) = \mu(\bigcup_{C\in E} C)};
  2. identify {\Xi} with the interval {\{0\}\times [0,1]} on the {y}-axis, and define {\hat\mu} as the image of one-dimensional Lebesgue measure on this interval.

Note that {\hat\mu} must satisfy the first of these no matter what the partition {\xi} is, while the second is a convenient description of {\hat\mu} in this particular example.

A similar-looking example (and the one which is most relevant for our purposes) comes by letting {R\subset M} be a rectangle and letting {\xi(y) = V_y^u = W_y^u\cap R} be the partition into local unstable leaves. To produce conditional measures {\mu_y^u = \mu_{V_y^u}}, we need to use the fact that the partition {\xi} is measurable. This means that there is a sequence of finite partitions {\xi_n} that refines to {\xi} in the sense that for {\mu}-a.e. {y}, we have

\displaystyle  \xi_{n+1}(y) \subset \xi_n(y) \text{ for all } n, \text{ and } \xi(y) = \textstyle\bigcap_{n=1}^\infty \xi_n(y).

Lemma 4 Given any rectangle {R}, the partition of {R} into local unstable leaves is measurable.

Proof: Fix {x\in R} and let {\{\eta_n(y)\}} be a refining sequence of finite partitions of {V_x^s} with the property that {\bigcap_n \eta_n(y) = \{y\}} for all {y\in V_x^s}; then let {\xi_n(y) = \bigcup_{z\in \eta_n(y)} V_z^u}, and we are done. \Box

Whenever {\xi} is a measurable partition of a compact metric space {X}, we can define the conditional measures {\mu_{\xi(y)}} as the limits of the conditional measures {\mu_{\xi_n(y)}}. Indeed, one can show (we omit the proofs) that for {\mu}-a.e. {y}, the limit {I_y(\varphi) := \lim_{n\rightarrow\infty} \int \varphi \,d\mu_{\xi_n(y)}} exists for every continuous {\varphi\colon X\rightarrow{\mathbb R}} and defines a continuous linear functional {I_y\in C(X)^*}; the corresponding measures {\mu_{\xi(y)}} satisfy

\displaystyle  \int\varphi\,d\mu = \int_\Xi \int_C \varphi\,d\mu_C \,d\hat\mu(C) \ \ \ \ \ (5)

for every {\varphi\in C(X)}, where once again we put {\hat\mu(E) = \mu(\bigcup_{C\in E} C)}.

The key result that we will need to describe properties of {\mu_y^u} when {\xi} is the partition of {R} into local unstable leaves is that for {\mu}-a.e. {y\in R} and every {\varphi\in C(X)}, we have

\displaystyle  \int \varphi\,d\mu_y^u = \lim_{n\rightarrow\infty} \frac 1{\mu(\xi_n(y))}\int_{\xi_n(y)} \varphi\,d\mu, \ \ \ \ \ (6)

where {\xi_n} is any sequence of finite partitions that refines to {\xi}.

In order to establish the local product structure for {\mu} that is claimed by Theorem 3, we will show that the measures {\mu_y^u} vary in an absolutely continuous manner as {y} varies within {R}. That is, we consider for every {x,y\in R} the holonomy map {\pi_{x,y} \colon V_x^u \rightarrow V_{y}^u} defined by moving along local stable manifolds, so

\displaystyle  \pi_{x,y}(z) = [z,y] = V_z^s \cap V_{y}^u \text{ for all } z\in V_x^u.

Our goal is to use the Gibbs property (1) for {\mu} to prove that for every rectangle {R} and {\mu}-a.e. {x,y\in R}, the conditional measures {\mu_x^u} and {\mu_{y}^u} satisfy

\displaystyle  (\pi_{x,y})_* \mu_x^u \ll \mu_{y}^u \text{ with } \bar{K}^{-1} \leq \frac{d(\pi_{x,y})_* \mu_x^u}{d\mu_{y}^u} \leq \bar{K}. \ \ \ \ \ (7)

Once this is established, we can proceed as follows. Consider a rectangle {R} with the decomposition {\xi} into local unstable manifolds, let {x\in R} be such that (7) holds for {\mu}-a.e. {y\in R}, and then identify {\Xi} with {V_x^s}, as in the second characterization of {\hat\mu} in Example 1. Let {\nu^s} be the measure on {V_x^s} corresponding to {\hat\mu} under this identification, and let {\nu^u = \mu_x^u}. Given {y\in V_x^s}, let {\psi_y = \frac{d(\pi_{x,y})_* \mu_x^u}{d\mu_{y}^u} \colon V_y^u \rightarrow [\bar{K}^{-1},\bar{K}]}, so that in particular we have

\displaystyle  \int_{V_y^u} \varphi\,d\mu_y^u = \int_{V_y^u} \frac{\varphi}{\psi_y}\, d(\pi_{x,y})_*\mu_x^u = \int_{V_x^u} \frac{\varphi([z,y])}{\psi_y([z,y])} \,d\mu_x^u

for every continuous {\varphi\colon R\rightarrow {\mathbb R}}. Then by (5), we have

\displaystyle  \begin{aligned} \int\varphi\,d\mu &= \int_{\Xi} \int_C \varphi \,d\mu_C \,d\hat\mu(C) = \int_{V_x^s} \int_{V_y^u} \varphi \,d\mu_y^u \,d\nu^s(y) \\ &= \int_{V_x^s} \int_{V_x^u} \frac{\varphi([z,y])}{\psi_y([z,y])} \,d\nu^u(z) \,d\nu^s(y). \end{aligned}

By Definition 2, this shows that {\mu} has local product structure with respect to {W^{s,u}}. Thus in order to prove Theorem 3, it suffices to shows that Gibbs measures satisfy the absolute continuity property (7).

It is worth noting quickly that our use of the term “absolute continuity” here has a rather different meaning from another common concept, which is that of a measure with “absolutely continuous conditional measures on unstable manifolds”. This latter notion is essential for the definition of SRB measures (indeed, in the Anosov setting it is the definition), and involves comparing {\mu_x^u} to volume measure on {V_x^u}, instead of to the pushforwards of other conditional measures under holonomy.

3. Adapted partitions

In order to prove the absolute continuity property (7), we need to obtain estimates on {\mu_y^u}. We start by getting estimates on {\mu} from the Gibbs property (1), and then using these to get estimates on {\mu_y^u} using (6).

We will need a family of partitions of {R} that refines to the partition into points. Fix a reference point {q\in R}, and suppose we have chosen partitions {\eta_n^s} of {V_q^s} and {\eta_m^u} of {V_q^u} for {m,n\geq 0}. Then we can define a partition {\xi_{m,n}} of {R} by taking the direct product of these two partitions, using the foliations of {R} by local stable and unstable leaves: that is, we put

\displaystyle  \xi_{m,n}(y) = \{z\in R : \eta_m^u([z,q]) = \eta_m^u([y,q]) \text{ and } \eta_n^s([q,z]) = \eta_n^s([q,y]) \} \ \ \ \ \ (8)

In order to obtain information on {\mu(\xi_{m,n}(y))} using the Gibbs property (1), we need to put an extra condition on the partitions we use; we need to them to be adapted, meaning that each partition element both contains a Bowen ball and is contained within a larger Bowen ball. Most of the ideas here are fairly standard in thermodynamic formalism, but it is important for us to work separately on the stable and unstable manifolds, then combine the two, so we describe things explicitly. Fix {\varepsilon>0}. Given {y\in V_q^u} and {m\geq 0}, let

\displaystyle  \begin{aligned} B_m^u(y,\varepsilon) &= \{y'\in V_q^u : d(f^ky' , f^k y) \leq \varepsilon\ \forall 0\leq k \leq m \} \\ &= \{y'\in R : d(f^k y', f^ky) \leq \varepsilon\ \forall -\infty < k \leq m \}. \end{aligned}

Similarly, given {z\in V_q^s} and {n\geq 0}, let

\displaystyle  \begin{aligned} B_n^s(z,\varepsilon) &= \{z'\in V_q^s : d(f^{-k} z', f^{-k}z) \leq \varepsilon\ \forall 0\leq k \leq n\} \\ &= \{z'\in R : d(f^k z', f^kz) \leq \varepsilon\ \forall -n\leq k < \infty\}. \end{aligned}

Given {\varepsilon > 0}, we say that the partitions {\eta_m^u} are {\varepsilon}-adapted if for every partition element {\eta_m^u(y')} there is {y\in V_q^u} such that

\displaystyle  B_m^u(y,\varepsilon/2) \subset \eta_m^u(y') \subset B_m^u(y,\varepsilon).

We make a similar definition for {\eta_n^s} using {B_n^s}. Note that we can produce an {\varepsilon}-adapted sequence of partitions {\eta_n^s} as follows:

  1. say that {E\subset V_q^s} is {(n,\varepsilon)}-separated if for every {x\neq y \in E} there is {0\leq k\leq n} such that {d(f^{-k}x,f^{-k}y) > \varepsilon};
  2. let {E \subset V_q^s} be a maximal {(n,\varepsilon)}-separated set and observe that {\bigcup_{x\in E} B_n^s(x,\varepsilon) = V_q^s}, while the sets {\{B_n^s(x,\varepsilon/2) : x\in E \}} are disjoint;
  3. enumerate {E} as {x_1,\dots, x_\ell}, and build an {\varepsilon}-adapted partition {\eta_n^s} by considering the sets

    \displaystyle  C_j = B_n^s(x_j,\varepsilon/2) \cup \Big( V_q^s \setminus \bigcup_{i>j} B_n^s(x_i,\varepsilon)\Big). \ \ \ \ \ (9)

Lemma 5 Let {\mu} be a measure satisfying the Gibbs property (1). Then there are {\varepsilon > 0} and {K_0>0} such that if {\eta_m^u} and {\eta_n^s} are {\varepsilon}-adapted partitions of {V_q^u} and {V_q^s}, then the product partition {\xi_{m,n}} defined in (8) satisfies

\displaystyle  K_0^{-1} \leq \frac{\mu(\xi_{m,n}(x))}{e^{-(n+m)P + \sum_{k=-n}^{m-1} \varphi(f^k x)}} \leq K_0 \ \ \ \ \ (10)

for every {x\in R}.

Proof: First we show that the upper bound in (10) holds whenever {\varepsilon>0} is sufficiently small. Fix {\varepsilon'>0} such that the Gibbs bound (1) holds for Bowen balls of radius {2\varepsilon'}, and such that {\varepsilon'} is significantly smaller than the size of any local stable or unstable leaf. It suffices to show that for every {x\in R} we have {\xi_{m,n}(x) \subset B_{-n,m}(y,2\varepsilon') := f^{-n}(B_{n+m}(f^n y, 2\varepsilon'))} for some {y}, since then (1) gives the upper bound, possibly with a different constant; note that replacing {x} with {y} in the denominator changes the quantity by at most a constant factor, using the fact that {\varphi} is Hölder continuous together with some basic properties of Anosov maps. (See, for example, Section 2 of this previous post for a proof of this Bowen property.)

Given {x,y\in M} such that {x} and {y} lie on the same local stable leaf with {d(x,y) \geq \varepsilon'}, let

\displaystyle  r^u(x,y) = \inf\{d(x',y') : x'\in W_x^u, y'\in W_y^u \cap W_{x'}^s\}

be the closest that any holonomy along local unstable leaves can bring {x} and {y}. Note that {r^u} is positive and continuous in {x} and {y}; by compactness there is {\bar{r}^u > 0} such that {r^u(x,y) \geq \bar{r}^u} for all {x,y} as above. In particular, this means that if {x,y} are the images under an (unstable) holonomy of some {x',y'} with {d(x',y') < \bar{r}^u}, then we must have {d(x,y) <\varepsilon}.

Choose {\bar{r}^s} similarly for stable holonomies, and fix {\varepsilon <\min(\bar{r}^s,\bar{r}^u)}. Fix {y\in V_q^u}, {z\in V_q^s}, {y'\in B_m^u(y,\varepsilon)}, and {z'\in B_n^s(z,\varepsilon)}. Then for every {-n\leq k\leq m} we have

\displaystyle  d(f^k[y',z'],f^k[y,z]) \leq d(f^k[y',z'],f^k[y',z]) + d(f^k[y',z],f^k[y,z]). \ \ \ \ \ (11)

These distances can be estimated by observing that {f^k[y',z']} and {f^k[y',z]} are the images of {f^k z'} and {f^kz} under a holonomy map along local unstables, and similarly {f^k[y',z]} and {f^k[y,z]} are images of {y} and {y'} under a holonomy map along local stables. Then our choice of {\varepsilon} shows that both quantities in the right-hand side of (11) are {\leq \varepsilon'}, which gives the inclusion we needed. This proves the upper bound in (10); the proof of the lower bound is similar. \Box

4. A refining sequence of adapted partitions

Armed with the formula from Lemma 5, we may look at the characterization of conditional measures in (6) and try to prove that the absolute continuity bound (7) holds whenever {x,y\in R} both satisfy (6). There is one problem before we do this, though; the formula in (6) requires that the sequence of partitions {\eta_n^s} be refining, and there is no a priori reason to expect that the adapted partitions produced by the simple argument before Lemma 5 refine each other. To get this additional property, we must do some more work. To the best of my knowledge, the arguments here are new.

4.1. Strategy

Start by letting {E_n \subset V_q^s} be a maximal {(n, 10\varepsilon)}-separated set. We want to build a refining sequence of adapted partitions using the sets {E_n}, where instead of using Bowen balls with radius {\varepsilon/2} and {\varepsilon} we will use Bowen balls with radius {\varepsilon} and {20\varepsilon}; this does not change anything essential about the previous section. We cannot immediately proceed as in the argument before Lemma 5, because if we are not careful when choosing the elements of the partition {\eta_n^s}, their boundaries might cut the Bowen balls {B_{n'}^s(x,\varepsilon)} for {x\in E_{n'}} and {n' > n}, spoiling our attempt to build an adapted partition {\eta_{n'}^s} that refines {\eta_n^s}.

A first step will be to only build {\eta_n^s} for some values of {n}. More precisely, we fix {\ell\in {\mathbb N}} such that {C\lambda^\ell < \frac 13}, so that for every {x\in M} and {y\in W_x^s}, we have {d(f^\ell x, f^\ell y) < \frac 13 d(x,y)}; then writing

\displaystyle  d_n^s(x,y) := \max_{0\leq k \leq n} d(f^{-k}x, f^{-k}y), \ \ \ \ \ (12)

we have the following for every {n,t\in {\mathbb N}}:

\displaystyle  d_n^s(x,y) \leq 3^{-t} d_{n+t\ell}^s(x,y) \text{ whenever } d_{n+t\ell}^s(x,y) < \varepsilon. \ \ \ \ \ (13)

We will only build adapted partitions for those {n} that are multiples of {\ell}. We need to understand when the Bowen balls associated to points in the sets {E_n} can overlap. We write {\mathcal{E} = \{(x,n) \in V_q^s \times \ell{\mathbb N} : x\in E_n \}}.

Definition 6 An {\varepsilon}-path between {(x,n)\in \mathcal{E}} and {(x',n')\in \mathcal{E}} is a sequence {\{(z_i, k_i) \in \mathcal{E} : 0\leq i\leq m \}} with {(z_0,k_0) = (x,n)} and {(z_m,k_m) = (x',n')} such that {B_{k_i}^s(z_i,\varepsilon) \cap B_{k_{i+1}}^s(z_{i+1},\varepsilon) \neq \emptyset} for all {0\leq i < m}. A subset {\mathcal{C} \subset \mathcal{E}} is {\varepsilon}-connected if there is an {\varepsilon}-path between any two elements of {\mathcal{C}}.

Given {n\in {\mathbb N}}, let {\mathcal{E}_{\geq n} = \{(x,k)\in \mathcal{E} : k\geq n\}}. In the next section we will prove the following.

Proposition 7 If {\mathcal{C} \subset \mathcal{E}_{\geq n}} is {\varepsilon}-connected and {(x,n)\in \mathcal{C}}, then

\displaystyle  U(\mathcal{C}) := \bigcup_{(y,k) \in \mathcal{C}} B_k^s(y,\varepsilon) \subset B_n^s(x,5\varepsilon). \ \ \ \ \ (14)

For now we show how to build a refining sequence of adapted partitions assuming (14) by modifying the construction in (9). The key is that we build our partitions {\eta_n^s} so that for every {\varepsilon}-connected {\mathcal{C} \subset \mathcal{E}_{\geq n}}, the set {U(\mathcal{C})} is completely contained in a single element of {\eta_n^s}.

Suppose we have built a partition {\eta_{n-\ell}^s} with this property; we need to construct a partition {\eta_n^s} that refines {\eta_{n-\ell}^s}, still has this property, and also has the property that every partition element {A} has some {(x,n)\in \mathcal{E}} such that {B_n^s(x,\varepsilon) \subset A \subset B_n^s(x,20\varepsilon)}. To this end, let {Z} be an element of the partition {\eta_{n-\ell}^s}, and enumerate the {\varepsilon}-connected components of {\mathcal{E}_{\geq n}} as {\mathcal{C}_1,\dots, \mathcal{C}_m, \mathcal{D}_1,\mathcal{D}_2,\dots}, where each {\mathcal{C}_j} contains some {(x_j,n)}, while each {\mathcal{D}_i} is in fact a subset of {\mathcal{E}_{\geq n+\ell}}.

It follows from (14) that {U(\mathcal{C}_j) \subset B_n^s(x_j,5\varepsilon)} for all {1\leq j\leq m}. Given {i\in {\mathbb N}}, we can observe that {U(\mathcal{D}_i) \subset B_{n_i}^s(y_i,5\varepsilon) \subset B_n^s(y_i,5\varepsilon)} for some {(y_i,n_i)\in \mathcal{E}} with {n_i > n}. Moreover, the sets {\{B_n^s(x_j,10\varepsilon) : 1\leq j\leq m\}} cover {Z}, so every {U(\mathcal{D}_i)} must intersect some set {B_n^s(x_j,10\varepsilon)}, and hence be contained in some {B_n^s(x_j,20\varepsilon)}. Given {i\in {\mathbb N}}, let {J(i) = \min \{j : U(\mathcal{D}_i) \subset B_n(x_i,20\varepsilon)\}}. Then for each {1\leq j\leq m}, let {I_j = \{i\in {\mathbb N} : J(i) = j\}}. Define sets {A_1,\dots, A_m} by

\displaystyle  A_j = U(\mathcal{C}_j) \cup \Big( \bigcup_{i\in I_j} U(\mathcal{D}_i) \Big) \cup \Big( Z\setminus \bigcup_{j' > j} B_n(x_{j'},20\varepsilon) \Big).

It is not hard to verify that the sets {A_j} form a partition of {Z} such that {B_n^s(x_j,\varepsilon) \subset A_j \subset B_n^s(x_j,20\varepsilon)}, and such that every {\varepsilon}-connected component {\mathcal{C}} of {\mathcal{E}_{\geq n}} that has {U(\mathcal{C}) \cap Z \neq\emptyset} is completely contained in some {A_j}. Repeating this procedure for the other elements of {\eta_{n-\ell}^s} produces the desired {\eta_n^s}.

4.2. Proof of the proposition

Now we must prove Proposition 7. Let {\mathcal{E} \subset V_q^s\times \ell{\mathbb N}} be as in the previous section, so that in particular, every {(x,n)\in \mathcal{E}} has that {n} is a multiple of {\ell}, and given any {(x,n)\neq (y,n)\in \mathcal{E}} we have {d_n^s(x,y) \geq 10\varepsilon}.

The following concept is essential for our proof.

Definition 8 Given an {\varepsilon}-path {\{(z_i,k_i)\}_{i=0}^m \subset \mathcal{E}}, a ford is a pair {0\leq i  k_i} for all {i<a<j}. Think of “fording a river'' to get from {B_{k_i}^s(z_i,\varepsilon)} to {B_{k_j}^s(z_j,\varepsilon)} by going through deeper levels of the {\varepsilon}-path; the alternative is that {k_a \leq k_i} for some {i<a<j}, in which case {B_{k_a}^s(z_a,\varepsilon)} is a sort of “bridge'' between {z_i} and {z_j}.

Lemma 9 Suppose that {\{(z_i,k_i)\}_{i=0}^m \subset \mathcal{E}} is an {\varepsilon}-path without any fords, and that {k_i \geq k_0} for all {1\leq i \leq m}. Then {d_{k_0}^s(z_0,z_m) \leq 4\varepsilon}.

Proof: By the definition of {\varepsilon}-path, for every {0\leq a < m} there is a point {y_a \in B_{k_a}^s(z_a,\varepsilon) \cap B_{k_{a+1}}^s(z_{a+1},\varepsilon)}. Thus

\displaystyle  \begin{aligned} d_{k_0}^s(z_0,z_m) &\leq d_{k_0}^s(z_0,y_0) + \Big( \sum_{a=1}^{m-1} d_{k_0}^s(y_{a-1}, z_a) + d_{k_0}^s(z_a,y_a) \Big) + d_{k_0}^s(y_{m-1},z_m) \\ &\leq 2\varepsilon + \sum_{a=1}^{m-1} (\tfrac 13)^{k_a - k} \big( d_{k_a}^s(y_{a-1}, z_a) + d_{k_a}^s(z_a,y_a) \big) \\ &\leq 2\varepsilon \Big( 1 + \sum_{a=i+1}^{j-1} (\tfrac13)^{k_a - k} \Big), \end{aligned} \ \ \ \ \ (15)

where the second inequality uses (13). Now we need to estimate how often different values of {k_a} can appear. Let {A = \{1, \dots, m-1\}}; we claim that for every {t\in {\mathbb N}}, we have

\displaystyle  \#\{a\in A : k_a = k_0 + t\ell \} \leq 2^{t-1}. \ \ \ \ \ (16)

First note that {k_a > k_0} for all {a\in A}, because otherwise {(0,a)} would be a ford. Since the {\varepsilon}-path has no fords, every {i<j} with {k_i = k_j} must be separated by some {a} with {k_a < k_i}, and we conclude that for every {t\in {\mathbb N}}, we have

\displaystyle  \begin{aligned} \#\{a \in A : &\, k_a = k_0 + t\ell\} \leq 1 + \#\{a\in A : k_0 < k_a < k_0 + t\ell\} \\ &= 1 + \sum_{q=1}^{t-1} \#\{a\in A : k_a = k_0 + q\ell\}. \end{aligned}

For {t=1}, this establishes (16) immediately. If (16) holds up to {t-1}, then this gives

\displaystyle  \#\{a\in A : k_a = k_0 + t\ell\} \leq 1 + \sum_{q=1}^{t-1} 2^{q-1} = 2^{t-1},

which proves (16) for all {t\in {\mathbb N}} by induction. Combining (15) and (16), we have

\displaystyle  \begin{aligned} d_k^s(z_i,z_j) &\leq 2\varepsilon \Big( 1 + \sum_{t=1}^\infty \Big(\frac13 \Big)^t \#\{a\in A : k_a = k_i + t\ell\}\Big) \\ &\leq 2\varepsilon + \varepsilon \sum_{t=1}^\infty \Big(\frac23 \Big)^t =4\varepsilon, \end{aligned}

which proves Lemma 9. \Box

Now we show that the {10\varepsilon}-separation condition rules out the existence of fords entirely.

Lemma 10 If {\{(z_i,k_i)\}_{i=0}^m \subset \mathcal{E}} is an {\varepsilon}-path, then it has no fords.

Proof: Fix an {\varepsilon}-path {\{(z_i,k_i)\}_{i=0}^m}. Denote the set of fords by

\displaystyle  I = \{(i,j)\in \{0,1,\dots,m\}^2 : i < j \text{ and } k_i = k_j \leq k_a \text{ for all } i<a<j \};

our goal is to prove that {I} is empty. Put a partial ordering on {I} by writing

\displaystyle  (i',j') \preceq (i,j)\ \Leftrightarrow\ [i',j'] \subset [i,j] \ \Leftrightarrow\ i \leq i' < j' \leq j.

If {I} is non-empty, then since it is finite it must contain some element {(i,j)} that is minimal with respect to this partial ordering. In particular, the {\varepsilon}-path {(z_i,k_i),\dots, (z_j,k_j)} contains no fords, and so by Lemma 9 we have {d_{k_i}^s(z_i,z_j) \leq 4\varepsilon}, contradicting the assumption that {d_{k_i}^s(z_i,z_j) \geq 10\varepsilon} (since {E_k} is {(k,10\varepsilon)}-separated), and we conclude that {I} must be empty, proving the lemma. \Box

Now we prove Proposition 7. Let {\mathcal{C} \subset \mathcal{E}_{\geq n}} be {\varepsilon}-connected and fix {(x,n) \in \mathcal{C}}. Given any {(y,k) \in \mathcal{C}}, there is an {\varepsilon}-path {\{(z_i,k_i)\}_{i=0}^m} such that {(z_0,k_0) = (x,n)} and {(z_m, k_m) = (y,k)}. By Lemma 10, this path has no fords, and so Lemma 9 gives {d_n^s(x,y) \leq 4\varepsilon}. We conclude that {B_n^s(y,\varepsilon) \subset B_n^s(x,5\varepsilon)}, which proves Proposition 7.

5. Completion of the proof

Thanks to the previous section, we can let {\eta_m^u} and {\eta_n^s} be {\varepsilon}-adapted partitions of {V_q^u} and {V_q^s} such that {\eta_{n'}^s} refines {\eta_n^s} whenever {n'>n} and {n,n'} are both multiples of {\ell}. By Lemma 5, we have good lower and upper estimates on {\mu(\xi_{m,n}(x))} for all {x\in R}, where {\xi_{m,n}} is the product partition defined in (8). By (6), there is {R'\subset R} with {\mu(R\setminus R')=0} such that for every {x\in R'}, we have

\displaystyle  \int \psi\,d\mu_x^u = \lim_{n\rightarrow\infty} \frac 1{\mu(\xi_{0,n\ell}(x))}\int_{\xi_{0,n\ell}(x)} \psi\,d\mu \ \ \ \ \ (17)

for every continuous {\psi\colon M\rightarrow {\mathbb R}}. The next step towards proving Theorem 3 is the following result.

Proposition 11 There is a constant {K_1} such that for any {x,y\in R'} and every continuous {\psi\colon V_y^u \rightarrow (0,\infty)}, we have {\int \psi\,d(\pi_{x,y})_*\mu_x^u \leq K_1 \int\psi\,d\mu_y^u}, where {\pi_{x,y} \colon V_x^u \rightarrow V_y^u} is the holonomy map along local unstables.

Proof: Start by fixing for each {m\in {\mathbb N}} a set {D_m^q \subset V_q^u} such that every element of {\eta_m^s} contains exactly one point in {D_m^q}. Then let {D_m^x = \pi_{q,x}D_m^q} for every {x\in R}.

Given a positive continuous function {\psi}, there is {\delta>0} such that if {d(z,z')<\delta}, then {\psi(z) \leq 2\psi(z')}. Thus for all sufficiently large {m,n}, we have {\psi(z) \leq 2\psi(z')} whenever {\xi_{m,n\ell}(z) = \xi_{m,n\ell}(z')}. For any such {m,n} and any {x,y\in R'}, we conclude that

\displaystyle  \begin{aligned} \int_{\xi_{0,n\ell}(x)} \psi\circ \pi_{x,y}\,d\mu &= \sum_{z\in D_m^x} \int_{\xi_{m,n\ell}(z)} \psi\circ \pi_{x,y}\,d\mu \leq \sum_{z\in D_{m}^x} 2\psi(\pi_{x,y}z) \mu(\xi_{m,n\ell}(z)) \\ &\leq \sum_{z\in D_{m}^x} 2\psi(\pi_{x,y}z) K_0 e^{-(n+m)P + S_{n,m}\varphi(z)}, \end{aligned}

where we write {S_{n,m}\varphi(z) = \sum_{k=-n\ell}^{m-1} \varphi(f^kz)} and where the last inequality uses Lemma 5. Similarly,

\displaystyle  \mu(\xi_{0,n\ell}(x)) = \sum_{z\in D_m^x} \mu(\xi_{m,n\ell}(z)) \geq K_0^{-1} e^{-(n+m) P + S_{n,m}\varphi(f^kz)},

and we conclude from (6) that

\displaystyle  \int \psi\circ \pi_{x,y} \,d\mu_x^u \leq \varlimsup_{n\rightarrow\infty} \frac{\sum_{z\in D_m^x} 2\psi(\pi_{x,y}z) K_0 e^{S_{n,m}\varphi(z)}}{\sum_{z\in D_m^x} K_0^{-1} e^{S_{n,m}\varphi(z)}}. \ \ \ \ \ (18)

Note that since {\varphi} is Hölder continuous, there are {\beta>0} and {|\varphi|_\beta > 0} such that {|\varphi(z)-\varphi(z')| \leq |\varphi|_\beta d(z,z')^\beta} for all {z,z'\in M}. Thus given any {z\in V_x^u} and any {n\geq 0}, we have

\displaystyle  \begin{aligned} \Big|\sum_{k=-n\ell}^{-1} \varphi(f^kx) &- \sum_{k=-n\ell}^{-1} \varphi(f^kz)\Big| \leq \sum_{k=1}^{n\ell} |\varphi(f^kx) - \varphi(f^kz)| \\ &\leq \sum_{k=1}^{n\ell} |\varphi|_\beta (C\lambda^k \varepsilon)^\beta < |\varphi|_\beta (C\varepsilon)^\beta \sum_{k=1}^\infty (\lambda^\beta)^k =: L < \infty. \end{aligned} \ \ \ \ \ (19)

Thus (18) gives

\displaystyle  \begin{aligned} \int \psi \,d(\pi_{x,y})_*\mu_x^u &\leq 2K_0^2 \varlimsup_{n\rightarrow\infty} \frac{\sum_{z\in D_m^x} \psi(\pi_{x,y}z) e^{L \sum_{k=-n\ell}^{-1} \varphi(f^k x)} e^{S_m\varphi(z)}} {\sum_{z\in D_m^x} e^{L^{-1} \sum_{k=-n\ell}^{-1} \varphi(f^k x)} e^{S_m\varphi(z)}} \\ &\leq 2K_0^2 e^{2L} \frac{\sum_{z\in D_m^x} \psi(\pi_{x,y}z) e^{S_m\varphi(z)}}{\sum_{z\in D_m^x} e^{S_m\varphi(z)}}. \end{aligned} \ \ \ \ \ (20)

A similar set of computations for {\mu_y^u} shows that

\displaystyle  \int\psi\,d\mu_y^u \geq \frac 1{2K_0^2 e^{2L}} \frac{\sum_{z'\in D_m^y} \psi(z') e^{S_m\varphi(z')}}{\sum_{z'\in D_m^y} e^{S_m\varphi(z')}}. \ \ \ \ \ (21)

Since {D_m^y = \pi_{x,y} D_m^x}, we can rewrite the sums over {D_m^y} as sums over {D_m^x}; for example,

\displaystyle  \sum_{z' \in D_m^y} e^{S_{m}\varphi(z')} = \sum_{z\in D_m^x} e^{S_{m}\varphi(\pi_{x,y} z)} \leq e^L \sum_{z\in D_m^x} e^{S_m\varphi(z)},

where the inequality uses the fact that the estimate (19) also holds for forward ergodic averages of two points on the same local stable manifold. Using a similar estimate for the numerator in (21) gives

\displaystyle  \int \psi\,d\mu_y^u \geq \frac 1{2K_0^2 e^{4L}} \frac{\sum_{z\in D_m^x} \psi(\pi_{x,y}z) e^{S_m\varphi(z)}}{\sum_{z\in D_m^x} e^{S_m\varphi(z)}}.

Together with (20), this gives

\displaystyle  \int \psi \,d(\pi_{x,y})_*\mu_x^u \leq 4K_0^4 e^{6L} \int\psi\,d\mu_y^u,

which completes the proof of Proposition 11. \Box

To complete the proof of Theorem 3, we first observe that for every open set {U \subset V_y^u}, there is a sequence of continuous functions {\psi_n\colon V_y^u \rightarrow (0,1]} that converge pointwise to the indicator function {\mathbf{1}_U}; applying Proposition 11 to these functions and using the dominated convergence theorem gives

\displaystyle  \begin{aligned} d(\pi_{x,y})_*\mu_x^u(U) &= \int \mathbf{1}_U \,d(\pi_{x,y})_*\mu_x^u = \lim_{n\rightarrow\infty} \int \psi_n \,d(\pi_{x,y})_*\mu_x^u \\ &\leq \lim_{n\rightarrow\infty} K_1 \int\psi_n \,d\mu_y^u = K_1 \int \mathbf{1}_U \,d\mu_y^u = K_1\mu_y^u(U). \end{aligned}

Then for every measurable {E \subset V_y^u}, we have

\displaystyle  \begin{aligned} d(\pi_{x,y})_*\mu_x^u(E) &= \inf \{ d(\pi_{x,y})_*\mu_x^u(U) : U \supset E \text{ is open} \} \\ &\leq K_1 \inf \{ \mu_y^u(U) : U \supset E \text{ is open} \} = K_1 \mu_y^u(E). \end{aligned}

This proves that {d(\pi_{x,y})_*\mu_x^u \ll \mu_y^u} and that the Radon–Nikodym derivative is {\leq K_1} {\mu_y^u}-a.e. The lower bound {K_1^{-1}} follows since the argument is symmetric in {x} and {y}. This proves (7) and thus completes the proof of Theorem 3.

Posted in ergodic theory, smooth dynamics | Leave a comment

Alpha-beta shifts

Given {\beta>1}, the interval map {T\colon [0,1]\rightarrow [0,1]} given by {T(x) = \beta x \pmod 1} is naturally semi-conjugate to the {\beta}-shift {\Sigma_\beta \subset \{0,1,\dots,\lceil\beta\rceil-1\}^{\mathbb N}}. The shift space {\Sigma_\beta} admits a natural description in terms of the lexicographic order, and another in terms of a countable-state directed graph. These have been used to obtain a fairly complete description of the dynamics of the {\beta}-transformation {T}, including existence, uniqueness, and strong statistical properties of equilibrium states for Hölder potentials.

In this post I want to explain how one can give similar descriptions (both in terms of lexicographic order and in terms of a countable-state Markov structure) for the coding spaces associated to the interval map {T(x) = \alpha + \beta x \pmod 1}. Just as with the {\beta}-shifts, these “{\alpha}{\beta} shifts” arise from piecewise expanding interval maps, and thus can be studied using the general machinery developed by Hofbauer, but I believe that they are worth studying a little more carefully as an intermediate case between the {\beta}-shifts, where the fact that {0} is a `safe symbol’ can be used to prove various results that are still open for {\alpha}{\beta} shifts, and the more general setting where the description of the shift space requires more bookkeeping and it can be harder to see what is going on.

1. Coding spaces for interval maps

We start by recalling the general notion of a coding space for a map of the interval. Say that a map {T\colon [0,1]\rightarrow [0,1]} is a piecewise expanding interval map if there is {\lambda>1} and a partition of {[0,1]} into finitely many intervals {I_0,\dots, I_{d-1}} such that {T} is continuous on each {I_j} and {C^1} on the interior of each {I_j}, with {|T'| \geq \lambda}. Note that we do not care whether the intervals are closed, open, or half of each.

Let {A = \{0,1,\dots, d-1\}}, and define a map {h\colon [0,1]\rightarrow A^{\mathbb N}} by {h(x)_n = a \in A} whenever {T^n(x) \in I_a}. Let {\Sigma = \Sigma_T = \overline{h([0,1])}}; we say that {\Sigma} is the coding space for the interval map {T} relative to the partition {\{I_j\}_j}. We can define a map {\pi\colon \Sigma\rightarrow [0,1]} by the condition that {\pi(h(x)) = x} for all {x\in [0,1]} and {\pi} is continuous. Then we have {\pi\circ \sigma = T\circ \pi}, so {\pi} is a semiconjugacy between {(\Sigma,\sigma)} and {([0,1],T)}.

Another way of interpreting this is the following: for each {a\in A} there is a unique map {S_a \colon \overline{T(I_a)} \rightarrow \overline{I_a}} such that {S_a(T x) = x} for all {x} in the interior of {I_a}. Note that {S_a} is {C^1} with {|S_a'| \leq \lambda^{-1} < 1}. Given any set {E\subset [0,1]}, write {S_a(E) = S_a(E \cap \overline{T(I_a)})} for convenience; note that {S_a(E)=\emptyset} if {E} does not intersect {\overline{T(I_a)}}. Given a finite word {w\in A^* = \bigcup_{n\geq 0} A^n}, let {J_w = S_w([0,1])}, where {S_w = S_{w_1} \circ \cdots S_{w_{|w|}}} and {|w|} is the length of {w}. Roughly speaking, {J_w} represents the set of points {x\in [0,1]} for which {x\in I_{w_1}}, {T(x) \in I_{w_2}}, and so on. (This is not quite completely true because of problems at the endpoints of the intervals.)

The language of the shift {\Sigma} is the set of all {w\in A^*} such that {[w] := \{x\in \Sigma : x_1 \cdots x_{|w|} = w\} \neq\emptyset}. Write {\mathcal{L}} for this collection of words; then {\mathcal{L} = \{ w \in A^* : J_w \neq \emptyset\}}, and we have {\pi([w]) = J_w} for all {w\in \mathcal{L}}. Let {\mathop{\mathcal L}_n = \{w\in \mathop{\mathcal L} : |w| = n\}}; given {w\in \mathop{\mathcal L}_n}, the interval {J_w} has length {\leq \lambda^{-n}}, and since for every {x\in \Sigma} we have {\{x\} = \bigcap_{n\geq 0} [x_1\cdots x_n]}, we also have {\{\pi(x)\} = \bigcap_{n\geq 0} J_{x_1 \cdots x_n}}.

So far these considerations have been completely general, valid for any piecewise expanding interval map. Now we consider the specific transformations {T_\beta(x) = \beta x \pmod 1} and {T_{\alpha,\beta} = \alpha + \beta x \pmod 1}, where {z\pmod 1 := z - \lfloor z \rfloor} and we assume without loss of generality that {\alpha \in [0,1)}. These are piecewise expanding interval maps, where the natural partition to consider is the partition into maximal intervals of continuity. Thus for {T_\beta}, we have {I_0 = [0,\frac 1\beta)}, {I_1 = [\frac 1\beta, \frac 2\beta)}, and so on, with {I_{d-1} = [\frac {d-1}\beta,1]}, where {d = \lceil\beta\rceil-1}. For {T_{\alpha,\beta}}, we have {I_0 = [0,\frac{1-\alpha}\beta)}, {I_1 = [\frac{1-\alpha}{\beta},\frac{2-\alpha}\beta)}, and so on, with {d = \lceil\alpha+\beta\rceil - 1} and {I_{d-1} = [\frac{d-1+\alpha}{\beta},1]}. We write {\Sigma_\beta} and {\Sigma_{\alpha,\beta}} for the coding spaces of these transformations relative to their natural partitions.

2. Description via lexicographic order

The full shift {A^{\mathbb N}} carries a natural lexicographic order inherited from the usual order on {A = \{0,1,\dots, d-1\}}: given {y\neq z\in A^{\mathbb N}}, let {n=n(y,z)} be minimal such that {y\neq z}, and write {y\prec z} if {y_n < z_n}. Write {y\preceq z} if {y\prec z} or {y=z}. This is a total order on {A^{\mathbb N}}. The key observation for our purposes is that because {T_\beta} and {T_{\alpha,\beta}} are order-preserving on each of their intervals of continuity, their coding maps are also order-preserving in the sense that {\pi(y) \leq \pi(z)} if and only if {y\preceq z}. (Note that we must use non-strict inequalities because the coding maps are not 1-1.)

2.1. The {\beta}-shifts

Fix {\beta>1} and let {T=T_\beta}. Let {{\mathbf b} = \sup \Sigma_\beta}, where supremum is w.r.t. lexicographic order; the supremum exists because {A^{\mathbb N}} has the least upper bound property, and is in {\Sigma_\beta} because the {\beta}-shift is closed. Thus {{\mathbf b} = b_1 b_2 b_3 \cdots} is the lexicographically maximal element of {\Sigma_\beta}; it would code the trajectory of the right endpoint of the unit interval if we replaced {T_\beta} with the map {T(x) = 1 + \beta x - \lceil \beta x \rceil}, which agrees with {T_\beta} everywhere except at the endpoints of the intervals of monotonicity. The points {T^n(1) \in [0,1]} will play an important role in the following result.

Proposition 1 A sequence {y\in A^{\mathbb N}} is in {\Sigma_\beta} if and only if

\displaystyle  \sigma^n(y) \preceq {\mathbf b} \text{ for all } n=0,1,2,\dots. \ \ \ \ \ (1)

Similarly, a word {w\in A^*} is in {\mathop{\mathcal L}(\Sigma_\beta)} if and only if

\displaystyle  w_{|w|-k+1}\cdots w_{|w|} \preceq b_1 \cdots b_k \text{ for all } 1\leq k\leq |w|. \ \ \ \ \ (2)

Proof: Because {\Sigma_\beta} is closed and {\sigma}-invariant, every {y\in \Sigma_\beta} satisfies (1), and it follows immediately that every {w\in \mathop{\mathcal L}(\Sigma_\beta)} satisfies (2).

To prove the converse direction, it suffices to show that every {w\in A^*} which satisfies (2) is contained in {\mathop{\mathcal L}}; in other words, that {J_w \neq \emptyset} for every such {w}. Equivalently, we can work with the following object: consider for each {w \in A^*} the follower interval {F_w = \overline{T^{|w|} (J_w)}}; this can also be defined as the minimal interval with the property that {S_w([0,1]\setminus F_w)=\emptyset}. Note that {F_w = \emptyset} whenever {w\notin \mathop{\mathcal L}}. Given {w\in \mathop{\mathcal L}} and {a\in A}, observe that

\displaystyle  F_{wa} = \overline{T^{1+|w|}(J_{wa})} = \overline{T(T^{|w|}S_w S_a[0,1])} = \overline{T(T^{|w|}S_w I_a)} = \overline{T(F_w \cap I_a)}. \ \ \ \ \ (3)

Now we can give a non-recursive description of {F_w} that completes the proof of Proposition 1. If {w\in A^*} satisfies (2), let

\displaystyle  k(w) = \max \{ k : w_{|w|-k+1} \cdots w_{|w|} = b_1 \cdots b_k \} \ \ \ \ \ (4)

be the length of the longest prefix of {{\mathbf b}} that appears as a suffix of {w}.

Lemma 2 If {w\in A^*} satisfies (2), then {F_w = [0,T^{k(w)}(1)]}.

Proof: We go by induction in the length of {w}, using {|w|=0} (so {w} is the empty word and {F_w = [0,1]}) as the base case. Suppose we have some {w\in A^n} satisfying (2), and that the lemma has been proved for all words with length {<n}. Write {w = va} where {|v| = n-1} and {a\in A}. Let {j = k(v)}, so that {v=u b_1 \cdots b_j} for some word {u}. By the inductive hypothesis, we have {F_v = [0,T^{j}(1)]} and hence {v\in \mathop{\mathcal L}}. Then by (3), we have

\displaystyle  F_w = F_{va} = \overline{T(F_v \cap I_a)} = \overline{T([0,T^{k(v)}(1)] \cap I_a)}.

There are two cases to consider.

Case one: {T^{k(v)}(1) \in I_a}. In this case we have {a=b_{j+1}} and {k(w) = j+1}, and {F_w = \overline{T([\frac a\beta,T^{k(v)}(1)])} = [0,T^{k(v)+1}] = [0,T^{k(w)}(1)]}.

Case two: {T^{k(v)}(1) \notin I_a}. In this case we must have {a < b_{j+1}}, otherwise we would have {w=ub_1\cdots b_j a} and hence {w_{|w|-j} \cdots w_{|w|} \succ b_1 \cdots b_{j+1}}, violating (2). Thus {I_a \subset F_v} and hence by (3), we have {F_w = \overline{T(I_a)} = [0,1]}. On the other hand, since {a < b_{j+1}}, we see that for every {1\leq i\leq j} we have

\displaystyle  w_{|w|-j+i} \cdots w_{|w|}= b_i b_{i+1} \cdots b_j a \prec b_i \cdots b_j b_{j+1} \preceq b_1 \cdots b_{j-i+2},

where the last inequality uses the fact that {\sigma^{i-1}{\mathbf b} \preceq {\mathbf b}}. We conclude that {k(w) = 0}, which completes the proof of Lemma 2. \Box

With Lemma 2 complete, Proposition 1 follows immediately since {F_w\neq \emptyset} implies that {w\in \mathop{\mathcal L}}. \Box

A historical note: the transformation {T_\beta} was introduced by Rényi in 1957 to study representations of real numbers in non-integer bases. The lexicographic characterization of {\beta}-shifts in Proposition 1 was given by Parry in 1960 (see Theorem 3 of that paper) using different methods than the proof here. This proof is essentially the same as the one given by Hofbauer in 1979 (see Theorem 2 there); Hofbauer’s approach has the advantage of dealing with all piecewise monotonic interval maps, and the argument given here for the {\beta}-shift is actually just a special case of his general result.

2.2. The {\alpha}{\beta} shifts

Instead of describing the full generality of Hofbauer’s result here, we describe how Proposition 1 adapts to the {\alpha}{\beta} shifts. This family of examples already reveals some of the challenges that arise when we go from {\beta}-shifts to more general piecewise expanding transformations.

Fix {\alpha\in [0,1)} and {\beta>1}, then let {T = T_{\alpha,\beta} \colon x \mapsto \alpha + \beta x \pmod 1}, and let {\Sigma = \Sigma_{\alpha,\beta}} be the coding space for {T} with respect to its natural partition. Let {{\mathbf a} = \inf \Sigma} and {{\mathbf b} = \sup \Sigma} be the lexicographically minimal and maximal elements of {\Sigma}. Then every {y\in \Sigma} has the property that

\displaystyle  {\mathbf a} \preceq \sigma^n y \preceq {\mathbf b} \text{ for all } n=0,1,2,\dots, \ \ \ \ \ (5)

and every {w\in \mathop{\mathcal L} = \mathop{\mathcal L}(\Sigma)} has the property that

\displaystyle  a_1 \cdots a_k \preceq w_{|w|-k+1} \cdots w_{|w|} \preceq b_1 \cdots b_k \text{ for all } 1\leq k \leq |w|. \ \ \ \ \ (6)

As with the {\beta}-shifts, these conditions are both necessary and sufficient for membership in {\Sigma} and {\mathop{\mathcal L}}. The key to proving this is the following analogue of Lemma 2, whose proof we omit since it is similar to the proof there (with just a little more bookkeeping).

Lemma 3 Given {w\in A^*} satisfying (6), let

\displaystyle  \begin{aligned} k_1(w) &= \max \{k : a_1 \cdots a_k = w_{|w|-k+1} \cdots w_{|w|} \}, \\ k_2(w) &= \max \{k : b_1 \cdots b_k = w_{|w|-k+1} \cdots w_{|w|} \}. \end{aligned} \ \ \ \ \ (7)

Then {F_w = [\underline{T}^{k_1(w)}(0),\overline{T}^{k_2(w)}(1)]}, where {\underline{T}} and {\overline{T}} agree with {T_{\alpha,\beta}} on the interiors of the intervals {I_a}, and are defined at the endpoints by the condition that {\underline{T}} is continuous from the right and {\overline{T}} is continuous from the left.

3. A directed graph on {{\mathbb N}}

The lexicographic description of {\Sigma_\beta} in Proposition 1 is appealing, but for many purposes it is more useful to have a description of the {\beta}-shift in terms of a countable-state directed graph, which goes back to work of Takahashi in 1973 and Hofbauer in 1978.

Start by considering the notion of follower set in a shift space, which is analogous to the follower intervals considered in the previous part. Given a shift space {\Sigma} with language {\mathop{\mathcal L}}, consider for each {w\in \mathop{\mathcal L}} the follower set

\displaystyle  F^w = \{y\in \Sigma : wy\in \Sigma \} = \sigma^{|w|}([w] \cap \Sigma). \ \ \ \ \ (8)

One could also consider the set of words {v\in \mathop{\mathcal L}} such that {wv\in \mathop{\mathcal L}}, but the definition in (8) will be more convenient for our purposes. It was shown by Weiss in 1973 that a shift can be represented via a labeled directed graph on finitely many vertices if and only if it has finitely many follower sets; that is, if {\{F^w : w\in \mathop{\mathcal L}\}} is finite. Such shifts are called sofic. Similarly, {\Sigma} can be represented via a labeled irreducible directed graph on countably many vertices if and only if it has countably many follower sets, and such shifts are called coded; see Fiebig and Fiebig 1992 and 2002.

Every {\beta}-shift is characterized by a sequence {{\mathbf b}\in A^{\mathbb N}} with the property that {\sigma^n{\mathbf b} \preceq{\mathbf b}} for all {n\geq 0}: as we saw in Proposition 1, we have

\displaystyle  \begin{aligned} \Sigma_\beta &= \{y\in A^{\mathbb N} : \sigma^n y \preceq {\mathbf b} \text{ for all } n\geq 0 \}, \\ \mathop{\mathcal L}(\Sigma_\beta) &= \{w\in A^* : w_{|w|-k+1}\cdots w_{|w|} \preceq b_1 \cdots b_k \text{ for all } 1\leq k \leq |w| \}. \end{aligned}

Moreover, it follows from Lemma 2 that for every {w\in \mathop{\mathcal L}(\Sigma_\beta)}, the corresponding follower set in {\Sigma_\beta} is

\displaystyle  F^w = \{y\in \Sigma : y\preceq \sigma^{k(w)}{\mathbf b}\} =: [\mathbf{0},\sigma^{k(w)}{\mathbf b}],

where {k(w)} is defined in (4) to be the length of the longest prefix of {{\mathbf b}} that appears as a suffix or {w}. Thus {F^w} is completely determined by {k(w)}, which can take countably many values, and since {\beta}-shifts are transitive this implies that they are coded. Thus they can be presented by a countable-state graph which supports a countable-state topological Markov chain (the Fischer cover).

Given any coded shift, the vertex set of the Fischer cover is the set of follower sets: {V = \{F \subset \Sigma : F=F^w} for some {w\in \mathop{\mathcal L}\}}. A labeled edge is a triple {(F,F',a)}, where {F,F'\in V} and {a\in A}; the labeled edge set of the Fischer cover is {E = \{(F^w,F^{wa},a) : wa\in \mathop{\mathcal L}, a\in A\}}; that is, there is an edge from {F^w} to {F^{wa}} whenever {wa\in \mathop{\mathcal L}} and {a\in A}, and this edge is labeled with the symbol {a}. Note that there may be multiple edges between two vertices, each with a different label.

Say that a sequence {y\in A^{\mathbb N}} labels a walk on the graph if there is a sequence of edges {e_1,e_2,\dots} such that for every {n},

  1. {e_n} is labeled with the symbol {y_n}, and
  2. the target vertex of {e_n} is the source vertex of {e_{n+1}}.

Then if {y\in A^{\mathbb N}} labels a walk on this graph, there is a word {w\in \mathop{\mathcal L}} such that {w y_1 \cdots y_n \in \mathop{\mathcal L}} for all {n}, and hence {y\in \Sigma}. Conversely, every {y\in \Sigma} labels a walk on the graph, so we can describe both {\Sigma} and {\mathop{\mathcal L}} in terms of walks on the graph.

For the {\beta}-shift, since the follower set {F^w} is completely determined by {k(w)}, we can identify the vertex set of the Fischer cover with {\{0,1,2,3,\dots\}}, and observe that the two cases in Lemma 2 give the following two types of edges.

  1. There is an edge from {k} to {k+1} whenever there are {v\in \mathop{\mathcal L}} and {a\in A} such that {va\in \mathop{\mathcal L}}, {k(v)=k}, and {k(va) = k+1}. This happens for every {k} since we can take {v=b_1 \cdots b_k} and {a=b_{k+1}}, so the edge from {k} to {k+1} is labeled with the symbol {b_{k+1}}.
  2. There is an edge from {k} to {0} whenever there are {v\in \mathop{\mathcal L}} and {a\in A} such that {va\in \mathop{\mathcal L}}, {k(v)=k}, and {k(va)=0}. This happens whenever {a < b_{k+1}}, and so whenever {b_{k+1} > 0} there are {b_{k+1}} edges from {k} to {0} labeled with the symbols {0,1,\dots, b_{k+1}-1}.

For example, when {{\mathbf b} = 21201\dots}, the first part of the graph looks like this:

The vertex labeled 0 can be thought of as the base vertex; it corresponds to the follower set {\Sigma}, so every sequence {y\in \Sigma} labels a walk starting at this vertex. The vertex labeled 1 corresponds to the follower set {F^2 = \{y\in \Sigma : y\preceq \sigma{\mathbf b}\}}; a sequence {y\in \Sigma} labels a walk starting at this vertex if and only if {2y\in \Sigma}. For a more typical sort of example, let {w=21121200212}. Then {w} corresponds to the walk which starts at vertex 0, goes out to vertex 2 before returning to vertex 0 (thanks to the fact that {w_3=1 < 2 = b_3}), then goes out to vertex 4 before returning to vertex 0 (because {w_8=0 < 1 = b_5}), then goes out to vertex 3 and stops. We see that {k(w) = 3} and that

\displaystyle  F^w = F^{212} = \{y\in \Sigma : 212y\in \Sigma\} = \{y\in \Sigma : y\preceq \sigma^3{\mathbf b}\}.

4. A directed graph on {{\mathbb N}^2}

In order to give an analogous description of {\Sigma_{\alpha,\beta}} via a countable-state Markov shift, we first observe that {\Sigma_{\alpha,\beta}} is determined lexicographically by two sequences {{\mathbf a},{\mathbf b}\in A^{\mathbb N}} with the property that

\displaystyle  {\mathbf a} \preceq \sigma^n{\mathbf a} \preceq {\mathbf b} \text{ and } {\mathbf a}\preceq \sigma^n{\mathbf b} \preceq{\mathbf b} \text{ for all } n\geq 0.

Given such an {{\mathbf a}} and {{\mathbf b}}, the corresponding shift space {\Sigma} and language {\mathop{\mathcal L}} are given by

\displaystyle  \begin{aligned} \Sigma &= \{y\in A^{\mathbb N} : {\mathbf a} \preceq \sigma^n y \preceq {\mathbf b} \text{ for all } n\geq 0 \}, \\ \mathop{\mathcal L} &= \{w\in A^* : a_1 \cdots a_k \preceq	 w_{|w|-k+1}\cdots w_{|w|} \preceq b_1 \cdots b_k \text{ for all } 1\leq k \leq |w| \}. \end{aligned}

From Lemma 3, the follower sets are given by

\displaystyle  F^w = [\sigma^{k_1(w)}{\mathbf a},\sigma^{k_2(w)}{\mathbf b}],

where {k_1,k_2} are given by (7) and represent the lengths of the longest prefixes of {{\mathbf a}} and {{\mathbf b}}, respectively, that appear as suffixes of {w}. Thus the follower set of {w} is completely determined by {k_1(w)} and {k_2(w)}, and so we can represent {\Sigma} by a graph whose vertices lie in {\{0,1,2,\dots\}^2}. As we will see momentarily, not every pair {(k_1,k_2) \in \{0,1,2,\dots\}^2} corresponds to a follower set, so we will not use all such vertices in building the graph, but thinking of the integer lattice still provides a useful way to visualize that situation.

It is useful to look at a concrete example. Let {{\mathbf b} = 212012112\dots} and {{\mathbf a} = 012101\dots}. Then some of the follower sets, together with their corresponding values of {(k_1,k_2)}, are as follows:

\displaystyle  \begin{aligned} \Sigma &= [{\mathbf a},{\mathbf b}] \quad &&(0,0), \qquad\qquad & F^{01} &= [\sigma^2{\mathbf a},{\mathbf b}] \quad &&(2,0), \\ F^0 &= [\sigma{\mathbf a},{\mathbf b}] &&(1,0), & F^{012} &= [\sigma^2{\mathbf a},\sigma {\mathbf b}] &&(3,1), \\ F^1 &= [{\mathbf a},{\mathbf b}] &&(0,0), & F^{0121} &= [\sigma^4{\mathbf a},\sigma^2{\mathbf b}] && (4,2), \\ F^2 &= [{\mathbf a},\sigma{\mathbf b}] &&(0,1), & F^{01210} &= [\sigma^5{\mathbf a},{\mathbf b}] && (5,0). \end{aligned}

In the graph for the {\beta}-shifts, remember that the outgoing edges from vertex {k} came in two types: from {k} to {0} with label {i} for each {0\leq i < b_{k+1}}, and from {k} to {k+1} with label {i} for {i=b_{k+1}}, depending on whether {k(wi) = k(w)+1} or {k(wi)=0}. (We switch to using {i} for a generic element of {A} now that {{\mathbf a}} is in play.)

For the {\alpha}{\beta} shifts, there are more possibilities for {k_1(wi)} and {k_2(wi)}:

  1. {k_1(wi) = k_2(wi)=0}, which happens if {a_{k_1 + 1} < i < b_{k_2 + 1}}, and gives an edge from {(k_1,k_2)} to {(0,0)} labeled {i};
  2. {k_1(wi)=k_1(w)+1} and {k_2(wi)=0}, which happens if {a_{k_1 +1} = i < b_{k_2 + 1}}, and gives an edge from {(k_1,k_2)} to {(k_1+1,0)} labeled {i};
  3. {k_1(wi)=0} and {k_2(wi)=k_2(w)+1}, which happens if {a_{k_1+1} < i = b_{k_2+1}}, and gives an edge from {(k_1,k_2)} to {(0,k_2+1)} labeled {i};
  4. {k_1(wi)=k_1(w)+1} and {k_2(wi) = k_2(w)+1}, which happens if {a_{k_1 + 1} = i = b_{k_2 + 1}}, and gives an edge from {(k_1,k_2)} to {(k_1+1,k_2+1)} labeled {i}.

For our concrete example with {{\mathbf b} = 212012112\dots} and {{\mathbf a} = 012101\dots}, the first part of the graph is as follows. Note that our coordinate conventions follow standard matrix notation (down, then right) instead of the usual cartesian coordinates, and that we write the vertices as {03} instead of {(0,3)}, and so on, in order to save space.

Observe that there are two types of vertices: on the one hand there are those such as {(2,0)}, {(3,1)}, {(0,3)}, {(1,4)}, with exactly one outgoing edge, which goes down and to the right; on the other hand there are vertices which have at least two outgoing edges, each of which `resets’ in either the horizontal or vertical directions (or both).

Let us conclude by formulating an open problem. In a previous post, I gave a proof of the (well-known) result that if {\Sigma} is a shift space with the specification property, then every Hölder continuous potential {\varphi\colon \Sigma \rightarrow {\mathbb R}} has the property that every equilibrium state for {\varphi} has positive entropy (“every Hölder potential is hyperbolic”). The {\beta}-shifts do not have specification in general (Schmeling showed in 1997 that the set of {\beta>1} such that {\Sigma_\beta} has specification is a null set for Lebesgue measure), but it turns out that they still have the property that every Hölder potential is hyperbolic; Dan Thompson and I showed this in a 2013 paper (arXiv:1106.3575).

Our argument there (see Proposition 3.1 and Section 6.1) relies on the fact that 0 is a `safe symbol’ for the {\beta}-shifts; if {w\in \mathop{\mathcal L}(\Sigma_\beta)} and {w'} is obtained from {w} by replacing a single symbol with 0, then {w'\in \mathop{\mathcal L}(\Sigma_\beta)} as well. This is no longer true for {\Sigma_{\alpha,\beta}}, and so our proof does not generalize. It would be interesting to know whether {\Sigma_{\alpha,\beta}} still has the property that every Hölder potential is hyperbolic; so far as I know this question is wide open. Buzzi conjectured in 2004 that this property holds not just for the {\alpha}{\beta} shifts, but for every coding space of a piecewise monotonic interval map. Attacking this problem for the {\alpha}{\beta} shifts seems like a reasonable first step towards resolving this conjecture.

Posted in ergodic theory, examples, Uncategorized | 2 Comments

Entropy bounds for equilibrium states

[Update 6/15/17: The original version of this post had a small error in it, which has been corrected in the present version; the definition of {\mathcal{I}_n} in the proof of the main theorem needed to be modified so that each {k_i} is a multiple of {2(\tau+1)}.  Thanks to Leonard Carapezza for pointing this out to me.]

Let {X} be a compact metric space and {f\colon X\rightarrow X} a homeomorphism. Recall that an equilibrium state for a continuous potential function {\varphi\colon X\rightarrow {\mathbb R}} is an {f}-invariant Borel probability measure on {X} maximizing the quantity {h_\mu(f) + \int\varphi\,d\mu} over all invariant probabilities; the topological pressure {P(\varphi)} is the value of this maximum.

A classical result on existence and uniqueness of equilibrium states is due to Bowen, who proved that if {f} is expansive and has specification, and {\varphi} has a bounded distortion property (the `Bowen property’), then there is a unique equilibrium state {\mu_\varphi}. In particular, this applies when {f} is Anosov and {\varphi} is Hölder.

It seems to be well-known among experts that under Bowen’s hypotheses, {\mu_\varphi} must have positive entropy (equivalently, {P(\varphi) > \sup_\mu \int\varphi\,d\mu}), but I do not know of an explicit reference. In this post I provide a proof of this fact, which also gives reasonably concrete bounds on the entropy of {\mu_\varphi}; equivalently, a bound on the size of the gap {P(\varphi) - \sup_\mu \int\varphi\,d\mu}.

1. Definitions and result

First, let’s recall the definitions in the form that I will need them. Given {x\in X}, {n\in {\mathbb N}}, and {\varepsilon>0}, the Bowen ball around {x} of order {n} and radius {\varepsilon} is the set

\displaystyle B_n(x,\varepsilon) := \{y\in X : d(f^kx, f^ky) < \varepsilon \text{ for all } 0\leq k < n\}.

The map {f} has specification if for every {\varepsilon>0} there is {\tau=\tau(\varepsilon)\in {\mathbb N}} such that for every {x_1,\dots, x_k\in X} and {n_1,\dots, n_k\in {\mathbb N}}, there is {x\in X} such that

\displaystyle x\in B_{n_1}(x_1,\varepsilon),\qquad f^{n_1 + \tau}(x)\in B_{n_2}(x_2,\varepsilon),

and in general

\displaystyle f^{n_1 + \tau + \cdots + n_{i-1} + \tau}(x) \in B_{n_i}(x_i,\varepsilon)

for every {1\leq k\leq n}. We refer to {\tau} as the “gluing time”; one could also consider a weaker property where the gluing times are allowed to vary but must be bounded above by {\tau}; this makes the estimates below more complicated, so for simplicity we will stick with the stronger version.

A function {\varphi\colon X\rightarrow {\mathbb R}} has the Bowen property at scale {\varepsilon} with distortion constant {V} if {V\in {\mathbb R}} is such that

\displaystyle |S_n\varphi(x) - S_n\varphi(y)| \leq V \text{ for all } x\in X\text{ and } y\in B_n(x,\varepsilon),

where {S_n\varphi(x) := \sum_{k=0}^{n-1} \varphi(f^k x)}. We write

\displaystyle \Lambda_n(\varphi,\varepsilon) := \sup_{E\in \mathcal{E}_{n,\varepsilon}} \sum_{x\in E} e^{S_n\varphi(x)},

where {\mathcal{E}_{n,\varepsilon}} is the collection of {(n,\varepsilon)}-separated subsets of {X} (those sets {E\subset X} for which {y\notin B_n(x,\varepsilon)} whenever {x,y\in E}, {x\neq y}). The topological pressure is {P(\varphi) = \lim_{\varepsilon\rightarrow 0} P(\varphi,\varepsilon)}, where

\displaystyle P(\varphi,\varepsilon) = \limsup_{n\rightarrow\infty} \frac 1n \log \Lambda_n(\varphi,\varepsilon).

Theorem 1 Let {X} be a compact metric space with diameter {>6\varepsilon}, {f\colon X\rightarrow X} a homeomorphism with specification at scale {\varepsilon} with gap size {\tau}, and {\varphi\colon X\rightarrow {\mathbb R}} a potential with the Bowen property at scale {\varepsilon} with distortion constant {V}. Let

\displaystyle \Delta = \frac{\log(1+e^{-(V+2(2\tau+1)\|\varphi\|)})}{2(\tau+1)}

where {\|\varphi\| = \sup_{x\in X} |\varphi(x)|}. Then we have

\displaystyle P(\varphi) \geq P(\varphi,\varepsilon) \geq \Big( \sup_\mu \int\varphi\,d\mu\Big) + \Delta. \ \ \ \ \ (1)


In particular, if {\mu} is an equilibrium state for {\varphi}, then we have {h_\mu(f) \geq \Delta > 0}.

2. Consequence for Anosov diffeomorphisms

Before proving the theorem we point out a useful corollary. If {M} is a compact manifold and {f\colon M\rightarrow M} is a topologically mixing {C^1} Anosov diffeomorphism, then {f} has specification at every scale (similar results apply in the Axiom A case). Moreover, every Hölder continuous potential has the Bowen property, and thus Theorem 1 applies.

For an Anosov diffeo, the constants {V} and {\tau} in (1) can be controlled by the following factors (here we fix a small {\varepsilon>0}):

  1. the rate of expansion and contraction along the stable and unstable directions, given in terms of {C,\lambda>0} such that {\|Df^n_x(v^s)\| \leq C e^{-\lambda n}} for all {n\geq 0} and {v^s\in E^s}, and similarly for {v^u\in E^u} and {n\leq 0};
  2. how quickly unstable manifolds become dense, in other words, the value of {R>0} such that {W_R^u(x)} is {\varepsilon}-dense for every choice of {x};
  3. the angle between stable and unstable directions, which controls the local product structure, in particular via a constant {K>0} such that {d(x,y) < \varepsilon} implies that {W_{K\varepsilon}^s(x)} intersects {W_{K\varepsilon}^u(y)} in a unique point {z}, and the leafwise distances from {x,y} to {z} are at most {K d(x,y)};
  4. the Hölder exponent ({\beta}) and constant ({|\varphi|_\beta}) for the potential {\varphi}.

For the specification property for an Anosov diffeo, {\tau =\tau(\varepsilon)} is determined by the condition that {C^{-1}e^{\lambda\tau}(\varepsilon/K) > R}, so that small pieces of unstable manifold expand to become {\varepsilon}-dense within {\tau} iterates; thus we have

\displaystyle \tau(\varepsilon) \approx \lambda^{-1} \log(R(\varepsilon) KC\varepsilon^{-1}).

For the Bowen property, one compares {S_n\varphi(x)} and {S_n\varphi(y)} by comparing each to {S_n\varphi(z)}, where {z} is the (Smale bracket) intersection point coming from the local product structure. Standard estimates give {d(f^j x, f^jz) \leq CK\varepsilon e^{-\lambda j}}, so the Hölder property gives

\displaystyle \begin{aligned} |S_n\varphi(x) - S_n\varphi(z)| &\leq \sum_{j=0}^{n-1} |\varphi(f^j x) - \varphi(f^j z)| \leq \sum_{j=0}^{n-1} |\varphi|_\beta d(f^jx,f^jz)^\beta \\ &\leq |\varphi|_\beta \sum_{j=0}^\infty (CK\varepsilon)^\beta e^{-\lambda\beta j} = |\varphi|_\beta (CK\varepsilon)^\beta (1-e^{-\lambda\beta})^{-1}. \end{aligned}

A similar estimate for {|S_n\varphi(y) - S_n\varphi(z)|} gives

\displaystyle V = 2(CK\varepsilon)^\beta(1- e^{-\lambda\beta})^{-1} |\varphi|_\beta.

Thus Theorem 1 has the following consequence for Anosov diffeomorphisms.

Corollary 2 Let {f} be a topologically mixing Anosov diffeomorphism on {M} and {C,\lambda,\varepsilon,R,K} the quantities above. Let

\displaystyle \delta = \frac{\lambda}{2\log(RKC\varepsilon^{-1})}.

Given a {\beta}-Hölder potential {\varphi\colon M\rightarrow {\mathbb R}}, consider the quantity

\displaystyle Q(\varphi) := 2(CK\varepsilon)^\beta (1-e^{-\lambda\beta})^{-1} |\varphi|_\beta + 5\lambda^{-1} \log(RKC\varepsilon^{-1})\|\varphi\|.

Then we have

\displaystyle P(\varphi) \geq P(\varphi,\varepsilon) \geq \Big(\sup_\mu \int\varphi\,d\mu\Big) + \delta \log(1+e^{-Q(\varphi)})

so that in particular, if {\mu} is an equilibrium state for {\varphi}, then

\displaystyle h_\mu(f) \geq \delta \log(1+e^{-Q(\varphi)}) > 0.

Finally, note that since shifting the value of {\varphi} by a constant does not change its equilibrium states, we can assume without loss of generality that {\|\varphi\| \leq (\mathrm{diam}\, M)^\beta |\varphi|_\beta} and write the following consequence of the above, which is somewhat simpler in appearance.

Corollary 3 Let {M} be a compact manifold and {f\colon M\rightarrow M} a topologically mixing Anosov diffeomorphism. For every {\beta>0} there are constants {\delta = \delta(M,f)>0} and {R = R(M,f,\beta)} such that for every {\beta}-Hölder potential {\varphi}, we have

\displaystyle P(\varphi) \geq \Big(\sup_\mu \int\varphi\,d\mu\Big) + \delta e^{-R|\varphi|_\beta}

so that as before, if {\mu} is an equilibrium state for {\varphi}, we have

\displaystyle h_\mu(f) \geq \delta e^{-R|\varphi|_\beta} > 0.

This corollary gives a precise bound on how the entropy of a family of equilibrium states can decay as the Hölder semi-norms {|\varphi|_\beta} of the corresponding potentials become large. To put it another way, given any threshold {h_0>0}, this gives an estimate on how large {|\varphi|_\beta} must be before {\varphi} can have an equilibrium state with entropy below {h_0}.

3. Proof of the theorem

We spend the rest of the post proving Theorem 1. Fix {x\in X} and consider for each {n\in {\mathbb N}} the orbit segment {x, f(x), \dots, f^{n-1}(x)}. Fix {\alpha\in (0,\frac 12]}. Let {m_n = \lceil \frac{\alpha n}{2(\tau+1)} \rceil}, and let

\displaystyle \mathcal{I}_n = \{ 0 < k_1 < k_2 < \cdots < k_{m_n} < n : k_i \in 2(\tau+1){\mathbb N} \ \forall i\}.

Write {k_0 = 0} and {k_{m_n + 1} = n}. The idea is that for each {\vec k\in \mathcal{I}_n}, we will use the specification property to construct a point {\pi(\vec k) \in X} whose orbit shadows the orbit of {x} from time {0} to time {n}, except for the times {k_i}, at which it deviates briefly; thus the points {\pi(\vec k)} will be {(n,\varepsilon)}-separated on the one hand, and on the other hand will have ergodic averages close to that of {x}.

First we estimate {\#\mathcal{I}_n} from below; this requires a lower bound on {{n\choose \ell}}. Integrating {\log t} over {[1,k]} and {[1,k+1]} gives

\displaystyle k\log k - k + 1 \leq \log(k!) \leq k\log k - k + 1 + \log(k+1),

and thus we have

\displaystyle \begin{aligned} \log{n\choose \ell} &= \log(n!) - \log(\ell!) - \log(n-\ell)! \\ &\geq n\log n + 1 - \ell\log\ell - (n-\ell)\log(n-\ell) - \log((\ell+1)(n-\ell+1)) \\ &\geq h\big( \tfrac\ell n\big) n - 2\log n, \end{aligned}

where {h(\delta) = -\delta\log\delta - (1-\delta)\log(1-\delta)}. This function is increasing on {(0,\frac12)}, so

\displaystyle \begin{aligned} \log\#\mathcal{I}_n &\geq \log{\lfloor \frac{n}{2(\tau+1)}\rfloor \choose m_n} \geq h(\tfrac {2(\tau+1) m_n} n) \frac{n}{2(\tau+1)} - 2\log \frac{n}{2(\tau+1)} \\ &\geq \frac{h(\alpha)}{2(\tau+1)} n - 2\log n. \end{aligned} \ \ \ \ \ (2)


Given {k\in \{0, \dots, n-1\}}, let {y_k \in X} be any point with {d(f^k(x),y_k) > 3\varepsilon} (using the assumption on the diameter of {X}). Now for every {\vec{k}\in \mathcal{I}_n}, the specification property guarantees the existence of a point {\pi(\vec{k})\in X} with the property that

\displaystyle \begin{aligned} \pi(\vec{k}) &\in B_{k_1-\tau}(x,\varepsilon), \\ \qquad f^{k_1}(\pi(\vec{k})) &\in B(y_{k_1},\varepsilon), \\ \qquad f^{k_1+\tau+1}(\pi(\vec{k})) &\in B_{k_2 - k_1 - 2\tau - 1}(f^{k_1 + \tau+1}(x)), \end{aligned}

and so on, so that in general for any {0\leq i \leq m_n} we have

\displaystyle \begin{aligned} f^{k_i + \tau + 1}(\pi(\vec{k})) &\in B_{k_{i+1} - k_i - 2\tau - 1}(f^{k_i + \tau _ 1}(x)), \\ f^{k_{i+1}}(\pi(\vec{k})) &\in B(y_{k_{i+1}},\varepsilon). \end{aligned} \ \ \ \ \ (3)


Write {j_i = k_{i+1} - k_i - 2\tau - 1}; then the first inclusion in (3), together with the Bowen property, gives

\displaystyle |S_{j_i} \varphi(f^{k_i + \tau + 1}(x)) - S_{j_i} \varphi(f^{k_i + \tau + 1} (\pi (\vec{k}))| \leq V.

Now observe that for any {y\in X} we have

\displaystyle \bigg| S_n \varphi(y) - \sum_{i=0}^{m_n} S_{k_{i+1} - k_i - 2\tau-1}\varphi(f^{k_i + \tau + 1} y) \bigg| \leq (2\tau + 1)m_n \|\varphi\|.

We conclude that

\displaystyle |S_n\varphi(\pi(\vec{k})) - S_n\varphi(x)| \leq m_n(V + 2(2\tau+1)\|\varphi\|). \ \ \ \ \ (4)


Consider the set {\pi(\mathcal{I}_n) \subset X}. The second inclusion in (3) guarantees that this set is {(n,\varepsilon)}-separated; indeed, given any {\vec{k} \neq \vec{k}' \in \mathcal{I}_n}, we can take {i} to be minimal such that {k_i \neq k_i'}, let {j=k_i'}, and then observe that {f^j(\pi(\vec{k})) \in B(y_j, \varepsilon)} and {f^j(\pi(\vec{k}')) \in B(f^j(x),\varepsilon)}; since {d(y_j,f^j(x)) > 3\varepsilon} this guarantees that {\pi(\vec{k}') \notin B_n(\pi(\vec{k}),\varepsilon)}.

Using this fact and the bounds in (4) and (2), we conclude that

\displaystyle \begin{aligned} \Lambda_n(\phi,\varepsilon) &\geq \sum_{\vec{k} \in \pi(\mathcal{I}_n)} e^{S_n\varphi(\pi(\vec{k}))} \\ &\geq (\#\mathcal{I}_n) \exp\big(S_n\varphi(x) - m_n(V+2(2\tau+1)\|\varphi\|)\big) \\ &\geq n^{-2} \exp\big(S_n \varphi(x) + \tfrac {h(\alpha)}{2(\tau+1)} n - (\tfrac{\alpha}{2(\tau+1)} n + 1)(V+2(2\tau+1)\|\varphi\|)\big). \end{aligned}

Taking logs, dividing by {n}, and sending {n\rightarrow\infty} gives

\displaystyle P(\varphi,\varepsilon) \geq \Big(\limsup_{n\rightarrow\infty} \frac 1n S_n\varphi(x) \Big) + \frac 1{2(\tau+1)} \Big(h(\alpha) - \alpha(V+2(2\tau+1)\|\varphi\|) \Big).

Given any ergodic {\mu}, we can take a generic point {x} for {\mu} and conclude that the lim sup in the above expression is equal to {\int\varphi\,d\mu}. Thus to bound the difference {P(\varphi,\varepsilon) - \int\varphi\,d\mu}, we want to choose the value of {\alpha \in (0,\frac 12]} that maximizes {h(\alpha) - \alpha Q}, where {Q=V+2(2\tau+1)\|\varphi\|}.

A straightforward differentiation and some routine algebra shows that {\frac d{d\alpha} (h(\alpha) - \alpha Q) = 0} occurs when {\alpha = (1+e^Q)^{-1}}, at which point we have {h(\alpha) - \alpha Q = \log(1+e^{-Q})}, proving Theorem 1.

Posted in Uncategorized | 2 Comments

Exponential decay of correlations

Let {(X,\mu)} be a probability space and {f\colon X\rightarrow X} a measure-preserving transformation. Let {B_1,B_2} be Banach spaces of measurable functions on {X}, and {\|\cdot\|_i} the corresponding norms. Given {\phi_i\in B_i} and {n\in {\mathbb N}}, the corresponding {n}th correlation is

\displaystyle C_n(\phi_1,\phi_2) := \int \phi_1(x) \phi_2(f^nx) \,d\mu(x) - \int \phi_1\,d\mu \int\phi_2\,d\mu.

We say that {(X,f,\mu)} has exponential decay of correlations with respect to observables in {B_1,B_2} if for any {\phi_i\in B_i}, we have

\displaystyle \limsup_{n\rightarrow\infty} \frac 1n \log |C_n(\phi_1,\phi_2)| < 0. \ \ \ \ \ (1)


Equivalently, for every {\phi_1\in B_1} and {\phi_2\in B_2}, there are {\lambda=\lambda(\phi_1,\phi_2)>0} and {K=K(\phi_1,\phi_2)>0} such that

\displaystyle |C_n(\phi_1,\phi_2)| \leq K(\phi_1,\phi_2) e^{-\lambda(\phi_1,\phi_2) n} \text{ for all } n. \ \ \ \ \ (2)


One sometimes sees the statement of exponential decay given in the following form, which is formally stronger than (2): there are constants {L>0} and {\lambda>0}, independent of {\phi_1,\phi_2}, such that

\displaystyle |C_n(\phi_1,\phi_2)| \leq L\|\phi_1\|_1 \|\phi_2\|_2 e^{-\lambda n} \text{ for all } n,\phi_1,\phi_2. \ \ \ \ \ (3)


In fact, using the Baire category theorem, one can prove that (2) implies (3) under a mild condition on the Banach spaces {B_i}; this is the goal of this post, to show that {\lambda} can be chosen uniformly over all {\phi_i}, and that {K} can be chosen to have the form {K(\phi_1,\phi_2) = L \|\phi_1\|_1\|\phi_2\|_2}. This seems like the sort of thing which is likely known to experts, but I am not aware of the reference in the literature. (I would be happy to learn a reference!)

Proposition 1 Let {(X,f,\mu)} be a probability measure-preserving transformation and {B_1,B_2} Banach spaces of measurable functions on {X} with the following properties:

  • given any {\phi_i\in B_i}, we have {\phi_i\in L^1(\mu)} and {\phi_1\phi_2\in L^1(\mu)};
  • the inclusions {(B_i,\|\cdot\|_i) \rightarrow (L^1,\|\cdot\|_{L^1})} are continuous;
  • for every {\phi_1\in B_1}, the map {(B_2,\|\cdot\|_2) \rightarrow (L^1,\|\cdot\|_{L^1})} given by {\phi_2 \mapsto \phi_1\phi_2} is continuous, and similarly for the map {\phi_1\mapsto \phi_1\phi_2} when {\phi_2} is fixed;
  • the map {f_* \colon B_2 \rightarrow B_2} given by {f_* \phi = \phi\circ f} is bounded w.r.t. {\|\cdot\|_2}.

Under these assumptions, if {(X,f,\mu)} has exponential decay of correlations w.r.t. observables in {B_1,B_2} in the sense of (2), then it also satisfies (3).

To prove the proposition, start by fixing {\phi_1\in B_1} and {\lambda>0}, and consider the function {K_\lambda^{\phi_1} \colon B_2 \rightarrow [0,\infty]} given by

\displaystyle K_\lambda^{\phi_1}(\phi_2) = \sup_n |C_n(\phi_1,\phi_2)| e^{\lambda n}.

Notice that for each {n}, the correlation function {C_n} is bilinear in {\phi_1,\phi_2}, and thus for every {\phi_2,\psi_2\in B_2} and {a,b\in {\mathbb R}}, we have

\displaystyle \begin{aligned} K_\lambda^{\phi_1}(a\phi_2 + b\psi_2) &\leq \sup_n |aC_n(\phi_1,\phi_2)|e^{\lambda n} + |bC_n(\phi_1,\psi_2)|e^{\lambda n} \\ &\leq |a| K_\lambda^{\phi_1}(\phi_2) + |b| K_\lambda^{\phi_1}(\psi_2). \end{aligned} \ \ \ \ \ (4)


Consider the following subsets of {B_2}:

\displaystyle V_{\lambda,K}^{\phi_1}(K) := \{\phi_2\in B_2 : K_\lambda^{\phi_1}(\phi_2) \leq K \}.

It follows from (2) that

\displaystyle \bigcup_{\lambda,K>0} V_{\lambda,K}^{\phi_1} = B_2. \ \ \ \ \ (5)


Moreover, the sets {V_{\lambda,K}^{\phi_1}} are nested (smaller {\lambda} gives a bigger set, larger {K} gives a bigger set) and so it suffices to take the union over rational values of {\lambda,K}, meaning that we can treat (5) as a countable union. In particular, by the Baire category theorem there are {\lambda,K>0} such that the closure of {V_{\lambda,K}^{\phi_1}} has non-empty interior. The next step is to show that

  • {V_{\lambda,K}^{\phi_1}} is closed, so it itself has non-empty interior;
  • in fact, {V_{\lambda,K}^{\phi_1}} contains a neighbourhood of the origin.

For the first of these, observe that by the assumptions we placed on the Banach spaces {B_1,B_2}, there is a constant {Q=Q(\phi_1)} such that

\displaystyle \|\phi_2\|_{L_1} \leq Q \|\phi_2\|_2 \text{ and } \|\phi_1 \phi_2\|_{L^1} \leq Q(\phi_1) \|\phi_2\|_2

for every {\phi_2\in B_2}. In particular,

\displaystyle \begin{aligned} |C_n(\phi_1,\phi_2)| &\leq Q(\phi_1) \|\phi_2\circ f^n\|_2 + Q\|\phi_1\|_{L^1} \|\phi_2\|_2 \\ &\leq (Q(\phi_1) \|f_*\|^n + Q\|\phi_1\|_{L^1}) \|\phi_2\|_2. \end{aligned}

Given a sequence {(\phi_2^{(k)})_k \subset V_{\lambda,K}^{\phi_1}} such that {\phi_2^{(k)} \rightarrow \phi_2 \in B_2} w.r.t. {\|\cdot\|_2} as {k\rightarrow\infty}, it follows that

\displaystyle |C_n(\phi_1,\phi_2)| \leq \limsup_{k\rightarrow\infty} |C_n(\phi_1,\phi_2^{(k)})| \leq K e^{-\lambda n},

and we conclude that {\phi_2\in V_{\lambda,K}^{\phi_1}}, so this set is closed. In particular, there is {\phi_2\in V_{\lambda,K}^{\phi_1}} and {\varepsilon>0} such that if {\|\psi_2\|_{B_2} \leq \varepsilon}, then {\phi_2 + \psi_2\in V_{\lambda,K}^{\phi_1}}. By the same token {\phi_2 - \psi_2\in V_{\lambda,K}^{\phi_1}}, and now the sublinearity property (4) gives

\displaystyle K_\lambda^{\phi_1}(\psi_2) \leq \tfrac 12 \big(K_\lambda^{\phi_1}(\phi_2 + \psi_2) - K_\lambda^{\phi_1}(\phi_2 - \psi_2)\big) \leq K,

and so {\psi_2\in V_{K,\lambda}^{\phi_1}}. This shows that {V_{K,\lambda}^{\phi_1}} contains a neighbourhood of 0, and writing {L = K/\varepsilon}, we see that for every {\psi_2\in B_2} we have {\varepsilon\frac{\psi_2}{\|\psi_2\|_2} \in V_{K,\lambda}^{\phi_1}}, and so

\displaystyle K_\lambda^{\phi_1}(\psi_2) = \varepsilon^{-1} \|\psi_2\|_2 K_\lambda^{\phi_1} \Big(\varepsilon\frac{\psi_2}{\|\psi_2\|_2}\Big) \leq \varepsilon^{-1}\|\psi_2\|_2 K = L \|\psi_2\|_2.

Thus we conclude that

\displaystyle |C_n(\phi_1,\phi_2)| \leq L(\phi_1) \|\phi_2\|_2 e^{-\lambda(\phi_1)n} \text{ for all } n.

To complete the proof of the proposition, it suffices to apply the same argument once more. Writing

\displaystyle W_{\lambda,K} = \{ \phi_1 \in B_1 : \lambda(\phi_1) \geq \lambda \text{ and } L(\phi_1) \leq L\}

we see from (2) that {B_1 = \bigcup_{\lambda,K>0} W_{\lambda,K}}, and so once again there are {\lambda,K>0} such that the closure of {W_{\lambda,K}} has non-empty interior. Given a sequence {\phi_1^{(k)} \in W_{\lambda,K}} with {\|\phi_1^{(k)} - \phi_1\|_1 \rightarrow 0}, we have {C_n(\phi_1^{(k)},\phi_2) \rightarrow C_n(\phi_1,\phi_2)} for all {n} and all {\phi_2}, and so {\phi_1\in W_{\lambda,K}}, demonstrating that this set is closed.

Thus there are {\phi_1\in W_{\lambda,K}} and {\varepsilon>0} such that {\|\psi_1\|\leq \varepsilon} implies {\phi_1+\psi_1\in W_{\lambda,K}}. In particular this gives {\phi_1 \pm \psi_1\in W_{\lambda,K}}, and so for every {n} and every {\phi_2} we have

\displaystyle \begin{aligned} |C_n(\psi_1,\phi_2)| &= \tfrac 12 |C_n(\phi_1 + \psi_1,\phi_2) - C_n(\phi_1 - \psi_1,\phi_2)| \\ & \leq \tfrac 12 |C_n(\phi_1 + \psi_1,\phi_2)| + \tfrac 12|C_n(\phi_1 - \psi_1,\phi_2)| \\ &\leq K \|\phi_2\|_2 e^{-\lambda n}. \end{aligned}

But then for every {\phi_1\in B_1} we can consider {\psi_1 = \varepsilon \phi_1 / \|\phi_1\|_1}, which has {\|\psi_1\|_1 = \varepsilon}, and so

\displaystyle |C_n(\phi_1,\phi_2)| = \|\phi_1\|_1 \varepsilon^{-1} |C_n(\psi_1,\phi_2)| \leq \|\phi_1\|_1 \varepsilon^{-1} K \|\phi_2\|_2 e^{-\lambda n},

which proves (3) and the proposition.

Posted in Uncategorized | Leave a comment

Lebesgue probability spaces, part II

This is a continuation of the previous post on the classification of complete probability spaces. Last time we set up the basic terminology and notation, and saw that for a complete probability space {(X,\mathcal{A},\mu)}, the {\sigma}-algebra {\mathcal{A}} is countably generated mod 0 (which we denoted (CG0)) if and only if the pseudo-metric {\rho(A,B) = \mu(A\Delta B)} makes {\hat{\mathcal{A}} = \mathcal{A}/{\sim}} into a separable metric space. We also considered {(\hat{\mathcal{A}},\mu)} as a measured abstract {\sigma}-algebra with non non-trivial null sets; this is the point of view we will continue with in this post.

Our goal is to present the result from [HvN] and [Ro] (see references from last time) that classifies such measured abstract {\sigma}-algebras up to {\sigma}-algebra isomorphism. To this end, let {(\Sigma,\mu)} be a separable measured abstract {\sigma}-algebra with no non-trivial null sets; let {X} be the maximal element of {\Sigma}, and suppose that {\mu(X)=1}. Note that {X'} is the minimal element of {\Sigma}, which would correspond to {\emptyset} if {\Sigma} were a collection of subsets of some ambient space. In the abstract setting, we will write {0=X'} for this minimal element.

An element {E\in \Sigma} is an atom if it has no proper non-trivial subelement; that is, if {F\in \Sigma, F\leq E} implies {F=0} or {F=E}. By the assumption that {\Sigma} has no non-trivial null sets, we have {\mu(E)>0} for every atom. Note that for any two atoms {E\neq F} we have {E\wedge F = 0}, and so {\mu(E\vee F) = \mu(E) + \mu(F)}.

Let {\Sigma_a \subset \Sigma} be the set of atoms; then we have {\sum_{E\in \Sigma_a} \mu(E) \leq 1}, so {\Sigma_a} is (at most) countable. Let

\displaystyle  \Sigma_{na} = \{ E\in \Sigma \mid E \wedge F = 0 \text{ for all } F\in \Sigma_a\},

then {\Sigma_{na}} is non-atomic; it contains no atoms. Thus {\Sigma} can be decomposed as a non-atomic part together with a countable collection of atoms. In particular, to classify {(\Sigma,\mu)} we may assume without loss of generality that {\Sigma} is non-atomic.

Consider the unit interval {[0,1]} with Lebesgue measure; let {\mathop{\mathcal L}} denote the set of equivalence classes (mod 0) of measurable subsets of {[0,1]}, and {m} denote Lebesgue measure, so {(\mathop{\mathcal L},m)} is a measured abstract {\sigma}-algebra. Moreover, {(\mathop{\mathcal L},m)} is separable, non-atomic, has no non-trivial null sets, and has total weight 1.

Theorem 1 Let {(\Sigma,\mu)} be a separable non-atomic measured abstract {\sigma}-algebra with total weight 1 and no non-trivial null sets. Then {(\Sigma,\mu)} is isomorphic to {(\mathop{\mathcal L},m)}.

The meaning of “isomorphism” here is that there is a bijection {T\colon \Sigma \rightarrow \mathop{\mathcal L}} that preserves the Boolean algebra structure and carries {m} to {\mu}. That is: {A\leq B} iff {T(A)\leq T(B)}; {T(A\vee B) = T(A) \vee T(B)}; {T(A\wedge B) = T(A)\wedge T(B)}; {T(A')=T(A)'}; {T(0)=0}; and finally, {m(T(A)) = \mu(A)}. We are most used to obtaining morphisms between {\sigma}-algebras as a byproduct of having a measurable map between the ambient sets; that is, given measurable spaces {(X,\mathcal{A})} and {(Y,\mathcal{B})}, a measurable map {f\colon X\rightarrow Y} gives a morphism {f^{-1} \colon \mathcal{B} \rightarrow \mathcal{A}}. However, in the abstract setting we currently work in, there is no ambient space, so we cannot yet interpret {T} this way. Eventually we will go from {(\Sigma,\mu)} and ({\mathop{\mathcal L},m)} back to the measure spaces that induced them, and then we will give conditions for {T} to be induced by a function between those spaces, but for now we stick with the abstract viewpoint.

We sketch a proof of Theorem 1 that roughly follows [HvN, Theorem 1]; see also [Ro, §1.3]. Given a measurable set {A\subset [0,1]}, we write {[A]\in \mathop{\mathcal L}} for the equivalence class of {A}; in particular, we write {[[a,b]]\in \mathop{\mathcal L}} for the equivalence class of the interval {[a,b]}.

Observe that if {Q\subset [0,1]} is dense, then {\Sigma} is generated by {\{[0,q]] : q\in Q\}}; thus we can describe {T} by finding {\{B_q \in \Sigma \mid q\in Q\}} such that

  • {\mu(B_q) = q = m[[0,q]]},
  • {B_q \leq B_{q'}} whenever {q\leq q'},
  • {\{B_q\}_{q\in Q}} generates {\Sigma},

and then putting {T(B_q) = [[0,q]]}. This is where separability comes in. Let {A_1,A_2,\dots \in \Sigma} be a countable collection that generates {\Sigma}. Put {T(A_1) = [[0,\mu(A_1)]]}, so {A_1 = B_q} for {q=\mu(A_1)}. Now what about {A_2}? We can use {A_2} to define two more of the sets {B_q} by noting that

\displaystyle  A_1 \wedge A_2 \leq A_1 \leq A_1 \vee A_2, \ \ \ \ \ (1)

and so we may reasonably set {T(E) = [[0,\mu(E)]]} for {E\in \{A_1 \wedge A_2, A_1, A_2 \vee A_2\}}.

To extend this further it is helpful to rephrase the last step. Consider

\displaystyle \begin{aligned} C_{00} &= A_1 \wedge A_2, \\ C_{01} &= A_1 \wedge A_2', \\ C_{10} &= A_1' \wedge A_2, \\ C_{11} &= A_1' \wedge A_2', \end{aligned}

and observe that (1) can be rewritten as

\displaystyle  C_{00} \leq (C_{00} \vee C_{01}) \leq (C_{00} \vee C_{01} \vee C_{10}).

Writing {\preceq} for the lexicographic order on {\{0,1\}^2}, we observe that each of these three is of the form {D_x = \bigvee_{y\preceq x} C_y} for some {x\in \{0,1\}^2}.

Each {C_x} above is determined as follows: {x_1} tells us whether to use {A_1} or {A_1'}, and {x_2} tells us whether to use {A_2} or {A_2'}. This generalises very naturally to {n\geq 2}: given {x\in \{0,1\}^n}, let

\displaystyle \begin{aligned} A_k^x &= \begin{cases} A_k & x_k = 0, \\ A_k' & x_k = 1. \end{cases}\\ C_x &= \bigwedge_{k=1}^n A_k^x. \end{aligned}

Now put {D_x = \bigvee_{y\preceq x} C_y}; this gives {2^n} elements {D_x} (the last one is {X = \bigvee_{y\in \{0,1\}^n} C_y}, and was omitted from our earlier bookkeeping). These have the property that {D_x \leq D_w} whenever {x\preceq w}. Moreover, writing {\{0,1\}^* = \bigcup_{n\in {\mathbb N}} \{0,1\}^n}, the collection {\{D_x \mid x\in \{0,1\}^*\}} generates {(\Sigma,\mu)} because {A_1,A_2,\dots} does. Let {T(D_x) = [[0,\mu(D_x)]]}; now we have all the pieces of the construction that we asked for earlier.

Putting it all together, let {Q = \{ \mu(D_x) \mid x\in \{0,1\}^* \}}. The following are now relatively straightforward exercises.

  • {Q} is dense in {[0,1]} since {(\Sigma,\mu)} is non-atomic.
  • For each {q\in Q} there is {w\in \{0,1\}^*} with {\mu(D_x) = q}; if {w,x\in \{0,1\}^*} such that this holds, then either {w=x} or {w = x11\cdots 1} (or vice versa). For this step we actually need to choose {A_n} a little more carefully to guarantee that each {C_x} is non-trivial.
  • The previous step guarantees that {T} is a bijection between {\Sigma} and {\mathop{\mathcal L}}.
  • Since {T} preserves the order {\leq} on the generating collections, it preserves the {\sigma}-algebra structure as well.
  • Since {T} carries {\mu} to {m} on the generating collections, it carries {\mu} to {m} on the whole {\sigma}-algebra.

Thus we have produced a {\sigma}-algebra isomorphism {T}, proving Theorem 1. Next time we will discuss conditions under which this can be extended to an isomorphism of the measure spaces themselves, and not just their abstract measured {\sigma}-algebras.

Posted in Uncategorized | 1 Comment

Lebesgue probability spaces, part I

In various areas of mathematics, classification theorems give a more or less complete understanding of what kinds of behaviour are possible. For example, in linear algebra we learn that up to isomorphism, {{\mathbb R}^n} is the only real vector space with dimension {n}, and every linear operator on a finite-dimensional vector space can be put into Jordan normal form via a change of coordinates; this means that many questions in linear algebra can be answered by understanding properties of Jordan normal form. A similar classification result is available in measure theory, but the preliminaries are a little more involved. In this and the next post I will describe the classification result for complete probability spaces, which gives conditions under which such a space is equivalent to the unit interval with Lebesgue measure.

The main original references for these results are a 1942 paper by Halmos and von Neumann [“Operator methods in classical mechanics. II”, Ann. of Math. (2), 43 (1942), 332–350, and a 1949 paper by Rokhlin [“On the fundamental ideas of measure theory”, Mat. Sbornik N.S. 25(67) (1949). 107–150, English translation in Amer. Math. Soc. Translation 1952 (1952). no. 71, 55 pp.]. I will refer to these as [HvN] and [Ro], respectively.

1. Equivalence of measure spaces

First we must establish exactly which class of measure spaces we work with, and under what conditions two measure spaces will be thought of as equivalent. Let {I} be the unit interval and {\mathop{\mathcal B},\mathop{\mathcal L}} the Borel and Lebesgue {\sigma}-algebras, respectively; let {m} be Lebesgue measure (on either of these). To avoid having to distinguish between {(I,\mathop{\mathcal B},m)} and {(I,\mathop{\mathcal L},m)}, let us agree to only work with complete measure spaces; this is no great loss, since given an arbitrary metric space {(X,\mathcal{A},\mu)} we can pass to its completion {(X,\overline{\mathcal{A}},\overline\mu)}.

The most obvious notion of isomorphism is that two complete measure spaces {(X,\mathcal{A},\mu)} and {(X',\mathcal{A}',\mu')} are isomorphic if there is a bijection {f\colon X\rightarrow X'} such that {f,f^{-1}} are measurable and {f_*\mu = \mu'}; that is, given {A\subset X} we have {A\in \mathcal{A}} if and only if {f(A) \in \mathcal{A}'}, and in this case {\mu(A) = \mu'(f(A))}.

In the end we want to loosen this definition a little bit. For example, consider the space {X = \{0,1\}^{\mathbb N}} of all infinite binary sequences, equipped with the Borel {\sigma}-algebra {\mathop{\mathcal B}} associated to the product topology (or if you prefer, the metric {d(x,y) = e^{-\min\{n \mid x_n \neq y_n\}}}). Let {\mu} be the {(\frac 12,\frac 12)}-Bernoulli measure on {(X,\mathop{\mathcal B})}; that is, for each {w\in \{0,1\}^n} the cylinder {[w] = \{x\in X \mid x_1 \cdots x_n = w\}} gets weight {\mu[w] = 2^{-n}}. Then there is a natural correspondence between the completion {(X,\overline{\mathop{\mathcal B}},\overline\mu)} and {(I,\mathop{\mathcal L},m)} given by

\displaystyle  \begin{aligned} f\colon X&\rightarrow I \\ x &\mapsto \sum_{n=1}^\infty x_n 2^{-n}. \end{aligned}

By looking at dyadic intervals {[\frac k{2^n}, \frac{k+1}{2^n}] \subset I} one can readily verify that {f_* \overline\mu = m}; however, {f} is not a bijection because for every {w\in \{0,1\}^n} we have {f(w01^\infty) = f(w10^\infty)}.

The points at which {f} is non-injective form a {\mu}-null set (since there are only countably many of them), so from the point of view of measure theory, it is natural to disregard them. This motivates the following definition.

Definition 1 Two measure spaces {(X,\mathcal{A},\mu)} and {(X',\mathcal{A}',\mu')} are isomorphic mod 0 if there are measurable sets {E \subset X} and {E'\subset X'} such that {\mu(X\setminus E) = \mu'(X'\setminus E') = 0}, together with a bijection {f\colon E\rightarrow E'} such that {f,f^{-1}} are measurable and {f_*(\mu|_E) = \mu'|_{E'}}.

From now on we will be interested in the question of classifying complete measure spaces up to isomorphism mod 0. The example above suggests that {(I,\mathop{\mathcal L},m)} is a reasonable candidate for a `canonical’ complete measure space that many others are equivalent to, and we will see that this is indeed the case.

Notice that the total measure {\mu(X)} is clearly an invariant of isomorphism mod 0, and hence we restrict our attention to probability spaces, for which {\mu(X)=1}.

2. Separability, etc.

Let {(X,\mathcal{A},\mu)} be a probability space. We describe several related conditions that all give senses in which {\mathcal{A}} can be understood via countable objects.

The {\sigma}-algebra {\mathcal{A}} carries a natural pseudo-metric given by {\rho(A,B) = \mu(A\Delta B)}. Write {A\sim B} if {\rho(A,B)=0}; this is an equivalence relation on {\mathcal{A}}, and we write {\hat{\mathcal{A}}} for the space of equivalence classes. The function {\rho} induces a metric {\hat\rho} on {\hat{\mathcal{A}}} in the natural way, and we say that {(X,\mathcal{A},\mu)} is separable if the metric space {(\hat{\mathcal{A}},\hat\rho)} is separable; that is, if it has a countable dense subset.

Another countability condition is this: call {\mathcal{A}} “countably generated” if there is a countable subset {\Gamma \subset \mathcal{A}} such that {\mathcal{A} = \sigma(\Gamma)} is the smallest {\sigma}-algebra containing {\Gamma}. We write (CG) for this property; for example, the Borel {\sigma}-algebra in {[0,1]} satisfies (CG) because we can take {\Gamma} to be the set of all intervals with rational endpoints. (In [HvN], such an {\mathcal{A}} is called “strictly separable”, but we avoid the word “separable” as we have already used it in connection with the metric space {(\hat{\mathcal{A}},\hat\rho)}.)

In and of itself, (CG) is not quite the right sort of property for our current discussion, because it does not hold when we pass to the completion; the Lebesgue {\sigma}-algebra {\mathop{\mathcal L}} is not countably generated (one can prove this using cardinality estimates). Let us say that {\mathcal{A}} satisfies property (CG0) (for “countably generated mod 0”) if there is a countably generated {\sigma}-algebra {\Sigma \subset \mathcal{A}} with the property that for every {E\in \mathcal{A}}, there is {F\in \Sigma} with {\mu(E\Delta F) = 0}. In other words, we have {\hat{\mathcal{A}} = \hat\Sigma}. Note that {\mathop{\mathcal L}} is countably generated mod 0 by taking {\Sigma = \mathop{\mathcal B}}. (In [HvN], such an {\mathcal{A}} is called “separable”; the same property is used in §2.1 of [Ro] with the label {(L')}, rendered in a font that I will not attempt to duplicate here.)

In fact, the approximation of {\mathop{\mathcal L}} by {\mathop{\mathcal B}} satisfies an extra condition. Let us write (CG0+) for the following condition on {\mathcal{A}}: there is a countably generated {\Sigma \subset \mathcal{A}} such that for every {E\in \mathcal{A}}, there is {F\in \Sigma} with {E\subset F} and {\mu(F\setminus E)=0}. This is satisfied for {\mathop{\mathcal L}} and {\mathop{\mathcal B}}. (In [HvN], such an {\mathcal{A}} is called “properly separable”; the same property is used in §2.1 of [Ro] with the label {(L)}.)

The four properties introduced above are related as follows.

\displaystyle  \textbf{(CG)} \Rightarrow \textbf{(CG0+)} \Rightarrow \textbf{(CG0)} \Leftrightarrow \text{separable}

The first two implications are immediate, and their converses fail in general:

  • The Lebesgue {\sigma}-algebra {\mathop{\mathcal L}} satisfies (CG0+) but not (CG).
  • Let {\mathcal{A} = \{ A \subset [0,1] \mid A\in \mathop{\mathcal L}, \mu(A)=0 \text{ or } 1\}}. Then {\mathcal{A}} satisfies (CG0) but not (CG0+).

Now we prove that (CG0) and separability are equivalent. First note that if {\Gamma \subset \mathcal{A}} is a countable subset, then the algebra {\mathop{\mathcal F}} generated by {\Gamma} is also countable; in particular, {(X,\mathcal{A},\mu)} is separable if and only if there is a countable algebra {\mathop{\mathcal F}\subset \mathcal{A}} that is dense with respect to {\rho}, and similarly in the definition of (CG0) the generating set can be taken to be an algebra. To show equivalence of (CG0) and separability it suffices to show that given an algebra {\mathop{\mathcal F} \subset \mathcal{A}} and {E\in \mathcal{A}}, we have

\displaystyle  (E\in \overline{\mathop{\mathcal F}}) \Leftrightarrow (\text{there is } A\in \sigma(\mathop{\mathcal F}) \text{ with } \rho(E,A) = \mu(E\Delta A) = 0). \ \ \ \ \ (1)

First we prove {(\Leftarrow)} by proving that {\overline{\mathop{\mathcal F}}} is a {\sigma}-algebra, and hence contains {\sigma(\mathop{\mathcal F})}; this will show that (CG0) implies separability.

  • Closure under {{}^c}: if {E\in \overline{\mathop{\mathcal F}}} then there are {A_n\in \mathop{\mathcal F}} such that {\rho(A_n,E) \rightarrow 0}. Since {\rho(A_n^c, E^c) = \rho(A_n, E)} and {A_n^c\in \mathop{\mathcal F}} (since it is an algebra), this gives {E^c\in \overline{\mathop{\mathcal F}}}.
  • Closure under {\cup}: given {E_1,E_2,\dots \in \overline{\mathop{\mathcal F}}}, let {E = \bigcup_n E_n}. To show that {E \in \overline{\mathop{\mathcal F}}}, note that given any {\epsilon>0}, there are {A_n\in \mathop{\mathcal F}} such that {\rho(A_n,E_n) < \epsilon 2^{-n}}. Let {F_N = \bigcup_{n=1}^N E_n} and {B_N = \bigcup_{n=1}^N A_n}; note that

    \displaystyle  \rho(F_N,B_N) \leq \sum_{n=1}^N \rho(E_n,A_n) < \epsilon.

    Moreover by continuity from below we have {\lim \mu(F_N) = \mu(E)}, so {\lim \rho(E,F_N)=0}, and thus for sufficiently large {N} we have {\rho(E,B_N) < \rho(E,F_N) + \rho(F_N,B_N) < 2\epsilon}. This holds for all {\epsilon>0}, so {E\in \overline{\mathop{\mathcal F}}}.

Now we prove {(\Rightarrow)}, thus proving that {\overline{\mathop{\mathcal F}}} is “large enough” that separability implies (CG0). Given any {E\in \overline{\mathop{\mathcal F}}}, there are {A_n\in \mathop{\mathcal F}} such that {\rho(A_n,E)\leq 2^{-n}}. Let { A = \bigcap_{N\in {\mathbb N}} \bigcup_{n\geq N} A_n \in \sigma(\mathop{\mathcal F}). } We get

\displaystyle  \begin{aligned} \mu(A \cap E^c) &= \mu\big(\bigcap_N \bigcup_{n\geq N} (A_n \cap E^c)\big) = \lim_{N\rightarrow\infty} \mu\big( \bigcup_{n\geq N} (A_n \cap E^c) \big) \\ &\leq \lim_{N\rightarrow\infty} \sum_{n\geq N} \mu(A_n \cap E^c) \leq \lim_{N\rightarrow\infty} 2^{1-N} = 0, \end{aligned}

and similarly, {A^c \cap E = \bigcup_N \bigcap_{n\geq N} (A_n^c \cap E)}, which gives

\displaystyle  \mu(A^c \cap E) \leq \sum_N \mu\big( \bigcap_{n\geq N} A_n^c \cap E\big) \leq \sum_N \limsup_{n\rightarrow\infty} \mu(A_n^c \cap E) = 0.

Then {\rho(A,E) = 0}, which completes the proof of {(\Rightarrow)}.

The first half of the argument above (the {\Leftarrow} direction) appears in this MathOverflow answer to a question discussing the relationship between different notions of separability, which ultimately inspired this post. That answer (by Joel David Hamkins) also suggests one further notion of “countably generated”, distinct from all of the above; say that {(X,\mathcal{A},\mu)} satisfies (CCG) (for “completion of countably generated”) if there is a countably generated {\sigma}-algebra {\Sigma \subset \mathcal{A}} such that {\mathcal{A} \subset \overline{\Sigma}}, where {\overline{\Sigma}} is the completion of {\Sigma} with respect to the measure {\mu}. One quickly sees that

\displaystyle  \textbf{(CG)} \Rightarrow \textbf{(CCG)} \Rightarrow \textbf{(CG0)}.

Both reverse implications fail; the Lebesgue {\sigma}-algebra satisfies (CCG) but not {\textbf{(CG)}}, and an example satisfying separability (and hence (CG0)) but not (CCG) was given in that same MathOverflow answer (the example involves ordinal numbers and something called the “club filter”, which I will not go into here).

3. Abstract {\sigma}-algebras

It is worth looking at some of the previous arguments through a different lens, that will also appear next time when we discuss the classification problem.

Recall the space of equivalence classes {\hat{\mathcal{A}}} from earlier, where {A\sim B} means that {\mu(A\Delta B) = 0}. Although elements of {\hat{\mathcal{A}}} are not subsets of {X}, we can still speak of the “union” of two such elements by choosing representatives from the respective equivalence classes; that is, given {\hat A, \hat B\in \hat{\mathcal{A}}}, we choose representatives {A\in \hat A} and {B\in \hat B} (so {A,B\in \mathcal{A}}), and consider the “union” of {\hat A} and {\hat B } to be the equivalence class of {A\cup B}; write this as {\hat A\vee \hat B}. One can easily check that this is well-defined; if {A_1\sim A_2} and {B_1\sim B_2}, then {(A_1\cup B_1) \sim (A_2 \cup B_2)}.

This shows that {\cup} induces a binary operation {\vee} on the space {\hat{\mathcal{A}}}; similarly, {\cap} induces a binary operation {\wedge}, complementation {A\mapsto A^c} induces an operation {\hat A \mapsto \hat A'}, and set inclusion {A\subset B} induces a partial order {\hat A \leq \hat B}. These give {\hat{\mathcal{A}}} the structure of a Boolean algebra; say that {\Sigma} is an abstract Boolean algebra if it has a partial order {\leq}, binary operations {\vee}, {\wedge}, and a unary operation {'}, satisfying the same rules as inclusion, union, intersection, and complementation:

  1. {A \vee B} is the join of {A} and {B} (the minimal element such that {A,B \leq A\vee B}), and {A\wedge B} is the meet of {A} and {B} (the maximal element such that {A,B \geq A\wedge B});
  2. the distributive laws {A\vee (B\wedge C) = (A\vee B) \wedge (A\vee C)} and {A\wedge (B\vee C) = (A\wedge B) \vee (A\wedge C)} hold;
  3. there is a maximal element {X} whose complement {X'} is the minimal element;
  4. {A\wedge A' = X'} and {A\vee A' = X}.

For the form of this list I have followed this blog post by Terry Tao, which gives a good in-depth discussion of some other issues relating to concrete and abstract Boolean algebras and {\sigma}-algebras.

Exercise 1 Using the four axioms above, prove the following properties:

  • {A'} is the unique element satisfying (4) — that is, if {A\vee B =X} and {A\wedge B = X'}, then {B=A'};
  • {(A')' = A};
  • de Morgan’s laws: {(A\vee B)' = A' \wedge B'} and {(A\wedge B)' = A' \vee B'}.

If you get stuck, see Chapter IV, Lemma 1.2 in A Course in Universal Algebra by Burris and Sankappanavar.

In fact {\hat{\mathcal{A}}} inherits just a little bit more, since {\vee} (and hence {\wedge}) can be iterated countably many times. We add this as a fifth axiom, and say that an abstract Boolean algebra {\Sigma} is an abstract {\sigma}-algebra if in addition to (1)–(4) it satisfies


  1. any countable family {A_1,A_2,\dots, \in \Sigma} has a least upper bound {\bigvee_n A_n} and a greatest lower bound {\bigwedge_n A_n}.

A measured abstract {\sigma}-algebra is a pair {(\Sigma,\mu)}, where {\Sigma} is an abstract {\sigma}-algebra and {\mu\colon \Sigma\rightarrow [0,\infty]} is a function satisfying the usual properties: {\mu(X')=0} and {\mu(\bigvee_n A_n) = \sum_n \mu(A_n)} whenever {A_i \wedge A_j = X'} for all {i\neq j}. (Note that {X'} is playing the role of {\emptyset}, but we avoid the latter notation to remind ourselves that elements of {\Sigma} do not need to be represented as subsets of some ambient space.)

The operations {\vee,\wedge,'} induce a binary operator {\Delta} on {\Sigma} by

\displaystyle  A \Delta B = (A\wedge B') \vee (B \wedge A'),

which is the abstract analogue of set difference, and so a measured abstract {\sigma}-algebra carries a pseudo-metric {\rho} defined by

\displaystyle  \rho(A,B) = \mu(A \Delta B).

If {(\Sigma,\mu)} has the property that {\mu(A)>0} for all {A\neq X'}, then this becomes a genuine metric.

In particular, if {(X,\mathcal{A},\mu)} is a measure space and {\Sigma = \hat{\mathcal{A}}} is the space of equivalence classes modulo {\sim} (equivalence mod 0), then {\mu} induces a function {\hat{\mathcal{A}} \rightarrow [0,\infty]}, which we continue to denote by {\mu}, such that {(\hat{\mathcal{A}},\mu)} is a measured abstract {\sigma}-algebra; this has the property that {\mu(A)>0} for all non-trivial {A\in \hat{\mathcal{A}}}, and so it defines a metric {\rho} as above.

Given an abstract {\sigma}-algebra {\Sigma} and a subset {\Gamma \subset \Sigma}, the algebra ({\sigma}-algebra) generated by {\Gamma\subset \Sigma} is the smallest algebra ({\sigma}-algebra) in {\Sigma} that contains {\Gamma}. Now we can interpret the equivalence (1) from the previous section (which drove the correspondence between (CG0) and separability) in terms of the measured abstract {\sigma}-algebra {\hat{\mathcal{A}}}.

Proposition 2 Let {(\Sigma,\mu)} be a measured abstract {\sigma}-algebra with no non-trivial null sets. Then for any algebra {\mathop{\mathcal F} \subset \Sigma}, we have {\overline{\mathop{\mathcal F}} = \sigma(\mathop{\mathcal F})}; that is, the {\rho}-closure of {\mathop{\mathcal F}} is equal to the {\sigma}-algebra generated by {\mathop{\mathcal F}}.

Next time we will see how separability (or equivalently, (CG0)) can be used to give a classification result for abstract measured {\sigma}-algebras, which at first requires us to take the abstract point of view introduced in this section. Finally, we will see what is needed to go from there to a similar result for probability spaces.

Posted in Uncategorized | 5 Comments

Unique MMEs with specification – an alternate proof

The variational principle for topological entropy says that if {X} is a compact metric space and {f\colon X\rightarrow X} is a continuous map, then {h_{\mathrm{top}}(f) = \sup_\mu h_\mu(f)}, where {h_{\mathrm{top}}} is the topological entropy, and the supremum is taken over all {f}-invariant Borel probability measures. A measure achieving this supremum is called a measure of maximal entropy (MME for short), and it is interesting to understand when a system has a unique MME.

Let’s look at this question in the setting of symbolic dynamics. Let {A} be a finite set, which we call an alphabet, and let {A^{\mathbb N}} be the set of all infinite sequences of symbols in {A}. This is a compact metric space in a standard way, and the shift map {\sigma} defined by {(\sigma(x))_n = x_{n+1}} is continuous. We consider a compact {\sigma}-invariant set {X\subset A^{\mathbb N}} and ask whether or not {(X,\sigma)} has a unique MME.

When {X} is a mixing subshift of finite type (SFT), this was answered in the 1960’s by Parry; there is a unique MME, and it can be obtained by considering the transition matrix for {X} and using some tools from linear algebra. A different proof was given by Bowen in 1974 using the specification property; this holds for all mixing SFTs, but also for a more general class of shift spaces.

The purpose of this note is to describe a variant of Bowen’s proof; roughly speaking, we follow Bowen’s proof for the first half of the argument, then give a shorter version of the second half of the argument, which follows comments made in conversation with Dan Thompson and Mike Hochman.

1. The strategy

The language of the shift space {X} is the set of all finite words that appear in some element of {X}. That is, given {n\in {\mathbb N}} we write {\mathcal{L}_n = \{w\in A^n \mid [w]\neq\emptyset\}}, where {[w] = \{x\in X \mid x_1\cdots x_n = w\}}. Then we write {\mathcal{L} = \bigcup_n \mathcal{L}_n}. Write {|w|} for the length of {w}.

In the setting of symbolic dynamics, the topological transitivity property takes the following form: {X} is transitive if and only if for every {u,v\in \mathcal{L}} there is {w\in \mathcal{L}} such that {uwv\in \mathcal{L}}. Transitivity by itself gives no control over the length of {w}. We say that shift space {X} has specification if there is {\tau\in {\mathbb N}} such that for every {u,v\in \mathcal{L}} there is {w\in \mathcal{L}} with {|w|=\tau} such that {uwv\in \mathcal{L}}; thus specification can be thought of as a uniform transitivity property.

Let {h = h_{\mathrm{top}}(f) = \lim_{n\rightarrow\infty} \frac 1n \log \#\mathcal{L}_n} be the topological entropy of {(X,\sigma)}. Bowen’s proof of uniqueness has the following structure:

  1. Show that there is {C>0} such that {e^{nh} \leq \#\mathcal{L}_n \leq C e^{nh}} for every {n}.
  2. Prove that for every {\alpha>0} there is {\beta>0} such that if {\mu} is an MME and {\mathcal{D}_n \subset \mathcal{L}_n} is a collection of words with {\mu(\mathcal{D}_n) \geq \alpha}, then we have {\#\mathcal{D}_n \geq \beta e^{nh}}. This can be thought of as a uniform version of the Katok entropy formula.
  3. Follow the proof of the variational principle to explicitly construct an MME {\mu}; then use the specification property and the counting estimates on {\#\mathcal{L}_n} to show that {\mu} has the Gibbs property and is ergodic.
  4. Show that if {\nu} is another ergodic MME, then the uniform Katok entropy formula for {\nu} and the Gibbs property for {\mu} cannot hold simultaneously; this contradiction proves uniqueness.

The proof we give here follows the above structure exactly for steps 1 and 2. Instead of steps 3 and 4, though, we give the following argument; if {\nu\neq \mu} are two distinct ergodic MMEs, then we can use the uniform Katok entropy formula for {\nu,\mu} together with the specification property to create more entropy in the system, a contradiction. In the next section we make all of this precise.

The advantage of this proof is that it allows us to replace step 3 of Bowen’s proof with a different argument that may be easier and less technical (depending on one’s taste). The disadvantage is that it does not include a proof of the Gibbs property for {\mu}, which is useful to know about in various settings.

2. The proof

2.1. Counting estimates

Write {\mathcal{L}_{m} \mathcal{L}_n} for the set of all words of the form {vw}, where {v\in \mathcal{L}_m} and {w\in \mathcal{L}_n}. Then it is clear that {\mathcal{L}_{m+n} \subset \mathcal{L}_m \mathcal{L}_n}, so {\#\mathcal{L}_{m+n} \leq (\#\mathcal{L}_m)(\#\mathcal{L}_n)}. In particular, {\log \#\mathcal{L}_n} is subadditive, so by Fekete’s lemma {h = \lim \frac 1n \log \#\mathcal{L}_n = \inf_n \frac 1n \log \#\mathcal{L}_n}, and we get {\#\mathcal{L}_n \geq e^{nh}} for every {n}.

The upper bound requires the specification property. Define a map {(\mathcal{L}_n)^k \rightarrow \mathcal{L}_{k(n+\tau)}} by sending {w_1,\dots,w_k} to {w_1 u_1 w_2 u_2 \cdots w_k u_k}, where {u_i \in \mathcal{L}_\tau} are provided by specification. This map is 1-1 so {\#\mathcal{L}_{k(n+\tau)} \geq (\#\mathcal{L}_n)^k}. Taking logs gives

\displaystyle  \frac 1{k(n+\tau)} \log\#\mathcal{L}_{k(n+\tau)} \geq \frac 1{n+\tau} \log \#\mathcal{L}_n,

and sending {k\rightarrow\infty} takes the left-hand side to {h}, so {\#\mathcal{L}_n \leq e^{(n+\tau)h}}; this gives the counting bounds we claimed earlier, with {C= e^{\tau h}}.

The gluing construction in the previous paragraph will be used later on when we need to derive a contradiction by producing extra entropy.

2.2. Uniform Katok formula

Now suppose {\mu} is any MME, and given {w\in \mathcal{L}} write {\mu(w) = \mu([w])}. Similarly write {\mu(\mathcal{D}_n) = \mu(\bigcup_{w\in \mathcal{D}_n} [w])}. Applying Fekete’s lemma to the subadditive sequence {H_n(\mu) = \sum_{w\in \mathcal{L}_n} -\mu(w) \log \mu(w)}, we get {nh \leq H_n(\mu)}.

Given {\mathcal{D}_n} with {\mu(\mathcal{D}_n)\geq \alpha}, we have

\displaystyle  \begin{aligned} nh &\leq \sum_{w\in \mathcal{D}_n} - \mu(w)\log \mu(w) + \sum_{w\in \mathcal{D}_n^c} -\mu(w)\log\mu(w) \\ &= \mu(\mathcal{D}_n)\sum_{w\in \mathcal{D}_n} - \frac{\mu(w)}{\mu(\mathcal{D}_n)} \log \frac{\mu(w)}{\mu(\mathcal{D}_n)} + \mu(\mathcal{D}_n^c)\sum_{w\in \mathcal{D}_n^c} - \frac{\mu(w)}{\mu(\mathcal{D}_n^c)} \log \frac{\mu(w)}{\mu(\mathcal{D}_n^c)} \\ &\qquad - \mu(\mathcal{D}_n)\log\mu(\mathcal{D}_n) - \mu(\mathcal{D}_n^c)\log\mu(\mathcal{D}_n^c) \\ &\leq \mu(\mathcal{D}_n) \log\#\mathcal{D}_n + \mu(\mathcal{D}_n^c) \log \#\mathcal{D}_n^c + \log 2 \\ &\leq \mu(\mathcal{D}_n) \log\#\mathcal{D}_n + (1-\mu(\mathcal{D}_n)) (nh + \log C) + \log 2 \\ &= \mu(\mathcal{D}_n) (\log \#\mathcal{D}_n - nh - \log C) + nh + \log (2C). \end{aligned}

Solving for {\#\mathcal{D}_n} gives

\displaystyle  \log\#\mathcal{D}_n \geq nh + \log C - \frac{\log(2C)}{\mu(\mathcal{D}_n)} \geq nh + \log C - \frac{\log(2C)}{\alpha},

so taking {\beta} with {\log\beta = \log C - \log(2C)/\alpha} gives {\log \#\mathcal{D}_n \geq \beta e^{nh}}, verifying the uniform Katok formula.

Note that the argument here does not rely directly on the specification property; it only uses the fact that {\mu} is an MME and that we have the uniform upper bound {\#\mathcal{L}_n \leq Ce^{nh}}. In fact, if one is a little more careful with the computations it is possible to show that we can take {\beta = \beta(\alpha) = C^{1-\frac 1\alpha} e^{-\frac{h(\alpha)}\alpha}}, where {h(\alpha) = -\alpha\log\alpha - (1-\alpha)\log(1-\alpha)}. This has the pleasing property that {\beta\rightarrow 1} as {\alpha\rightarrow 1}, so the lower bound for {\#\mathcal{D}_n} converges to the lower bound for {\#\mathcal{L}_n}.

2.3. Producing more entropy

Now suppose {\mu,\nu} are two distinct ergodic MMEs. Then {\mu\perp \nu} so there are disjoint sets {Y,Z\subset X} with {\mu(Y) = 1} and {\nu(Z)=1}. Fix {\delta>0} and let {Y'\subset Y} and {Z'\subset Z} be compact sets such that {\mu(Y')> 1-\delta} and {\nu(Z')>1-\delta}. Then the distance between {Y',Z'} is positive, so there is {N\in {\mathbb N}} such that for every {n\geq N-\tau}, no {n}-cylinder intersects both {Y',Z'}. In particular, putting {\mathcal{Y}_n = \{w\in \mathcal{L}_n \mid [w] \cap Y' \neq\emptyset\}}, and similarly for {\mathcal{Z}_n} with {Z'}, we have

  • {\mathcal{Y}_n \cap \mathcal{Z}_n = \emptyset} for every {n\geq N-\tau}, and
  • {\mu(\mathcal{Y}_n) \geq 1-\delta} and {\nu(\mathcal{Z}_n) \geq 1-\delta}, so by the uniform Katok formula we have {\#\mathcal{Y}_n,\#\mathcal{Z}_n \geq \beta e^{nh}}, where {\beta = \beta(1-\delta) > 0}.

Up to now all of the arguments that we have made appear in Bowen’s proof; the approximation argument just given appears in step 4 of his proof, where one uses the Gibbs property of the constructed MME to derive a contradiction with the uniform Katok formula. It is at this point that our argument diverges from Bowen’s, since we have not proved a Gibbs property. Instead, we apply the specification property to the collections of words {\mathcal{Y},\mathcal{Z} \subset \mathcal{L}} to get a lower bound on {\#\mathcal{L}_n} that grows more quickly than {e^{nh}}, a contradiction.

Before giving the correct argument, let me describe three incorrect arguments that illustrate the main ideas and also show why certain technical points arise in the final argument; in particular this will describe the process of arriving at the proof, which I think is worthwhile.

First wrong argument

Here is a natural way to proceed. With {N,\mathcal{Y},\mathcal{Z}} as above, consider for each {k\in {\mathbb N}} and each {\omega\in \{0,1\}^k} the set of all words

\displaystyle  w^1 u^1 w^2 u^2 \cdots w^k,

where {w^i \in \mathcal{Y}_{N-\tau}} if {\omega_i = 0} and {\mathcal{Z}_{N-\tau}} if {\omega_i=1}, and {u^i \in \mathcal{L}_\tau} are the gluing words provided by specification. Note that each choice of {\omega} produces at least {(\beta e^{(N-\tau)h})^k} words in {\mathcal{L}_{kN}}. Moreover, the collections of words produced by different choices of {\omega} are disjoint, because {\mathcal{Y}_{N-\tau}} and {\mathcal{Z}_{N-\tau}} are disjoint. We conclude that

\displaystyle  \#\mathcal{L}_{kN} \geq 2^k \beta^k e^{k(N-\tau)h},


\displaystyle  \tfrac 1{kN}\log \#\mathcal{L}_{kN} \geq h + \tfrac 1N(\log(2\beta) - \tau h).

If {\log(2\beta) > \tau h} then this would be enough to show that {h_\mathrm{top} > h}, a contradiction. Unfortunately since {\beta< 1} this argument does not work if {\tau h \geq 2}, so we must try again…

Second wrong argument

Looking at the above attempted proof, one may describe the problem as follows: each of the words {u^i} makes us lose a factor of {e^{-\tau h}} from our estimate on {\#\mathcal{L}_{kN}}. If {\omega_i = \omega_{i+1}}, then instead of letting {w^i} and {w^{i+1}} range over {\mathcal{Y}_{N-\tau}} and then gluing them, we could replace the words {w^i u^i w^{i+1}} with the words {v\in \mathcal{Y}_{2N-\tau}}. In particular, this would replace the estimate {\beta^2 e^{2(N-\tau)}} with the estimate {\beta e^{2N - \tau}}.

This suggests that given {\omega\in \{0,1\}^k}, we should only keep track of the set {J \subset \{1,\dots, k\}} for which {\omega_{j+1} \neq \omega_j}, since if {\omega_{i+1} = \omega_i} we can avoid losing the factor of {e^{-\tau h}} by avoiding the gluing process.

So, let’s try it. Given {J = \{j_1 < \cdots < j_\ell \} \subset \{1, \dots, k\}}, let {m_i = j_{i+1} - j_i} (with {m_0=j_1} and {m_\ell = k - j_\ell}), and consider the map

\displaystyle  \pi_J \colon \mathcal{Y}_{m_0 N - \tau} \times \mathcal{Z}_{m_1 N - \tau} \times \mathcal{Y}_{m_2 N - \tau} \times \cdots \times \mathcal{Y}_{m_{\ell-1}N - \tau} \times \mathcal{Z}_{m_\ell N} \rightarrow \mathcal{L}_{kN}

given by specification (whether the product ends with {\mathcal{Y}} or {\mathcal{Z}} depends on the parity of {\ell}).

Let {\mathcal{D}^J_{kN}} be the image of {\pi_J}, and note that

\displaystyle  \#\mathcal{D}_{kN}^J \geq \prod_{i=0}^\ell \beta e^{(m_iN - \tau) h} \geq \beta^{\ell + 1} e^{(kN - \ell\tau)h}. \ \ \ \ \ (1)

If the collections {\mathcal{D}_{kN}^J}, {\mathcal{D}_{kN}^{J'}} were disjoint for different choices of {J,J'}, then we could fix {\gamma>0} and sum (1) over all {J} with {\#J \leq \gamma k} to get

\displaystyle  \#\mathcal{L}_{kN} \geq \left(\sum_{\ell = 0}^{\gamma k} {k \choose \ell} \right) \beta^{\gamma k} e^{(kN - \gamma k \tau)h} \geq e^{(-\gamma \log \gamma)k} e^{k(\gamma\log\beta + Nh - \gamma\tau h)},

where we use Sitrling’s approximation for the last inequality. Taking logs and dividing by {kN} gives

\displaystyle  \tfrac 1{kN} \log \#\mathcal{L}_{kN} \geq h + \tfrac \gamma N ( -\log \gamma + \log\beta - \tau h).

For {\gamma>0} sufficiently small this is {> h}, which gives {h_\mathrm{top} > h}, a contradiction.

The problem with this argument is that the collections {\mathcal{D}_{kN}^J} need not be disjoint for different choices of {J}; this is because {w\in \mathcal{Y}_{mN}} may have {w_{[jN, (j+1)N)} \in \mathcal{Z}} for some value of {j}, so that we cannot necessarily recover {J} uniquely from knowing {w\in \mathcal{D}_{kN}^J}.

Third wrong argument

Let’s try to address the issue just raised, that we cannot recover {J} uniquely from {w\in \mathcal{D}_{kN}^J} because {w\in \mathcal{Y}_{mN}} may have subwords in {\mathcal{Z}}. We address this by only using words {w} where there are `not many’ such subwords. More precisely, given {w\in \mathcal{Y}_{n}} for {n\geq N-\tau}, let

\displaystyle  B(w) = \{ 0 \leq j < \tfrac{n+\tau}N \mid w_{[jN, (j+1)N-\tau)}\in \mathcal{Z} \}

be the set of `bad’ times, and similarly with the roles of {\mathcal{Y}} and {\mathcal{Z}} reversed. Let {b(w) = \#B(w)}, and observe that since {\mu(Z') < \delta}, invariance of {\mu} gives

\displaystyle  \mu\{w\in \mathcal{Y}_{n} \mid j\in B(w)\} \leq \delta

for every {0\leq j < \frac{n+\tau}N}. We conclude that

\displaystyle  \sum_{w\in \mathcal{Y}_{n}} b(w)\mu(w) \leq \sum_{j=0}^{\lfloor\frac{n+\tau}N\rfloor-1} \mu\{w\in \mathcal{Y}_{n} \mid j\in B(w)\} \leq \delta \tfrac{n+\tau}N.

Let {\mathcal{Y}'_{n} = \{w\in \mathcal{Y}_{n} \mid b(w) \leq 2\delta \frac{n+\tau}N\}}, and note that

\displaystyle  \delta \tfrac{n+\tau}N \geq \sum_{w\in \mathcal{Y}_{n} \setminus \mathcal{Y}_{n}'} \mu(w) 2\delta \tfrac{n+\tau}N,


\displaystyle  \tfrac 12 \geq \mu(\mathcal{Y}_n) - \mu(\mathcal{Y}_n') \geq 1-\delta - \mu(\mathcal{Y}_n').

We conclude that

\displaystyle  \mu(\mathcal{Y}_{n}') \geq \tfrac 12 - \delta,

so taking {\beta = \beta(\frac 12 - \delta)}, the uniform Katok estimate gives {\#\mathcal{Y}_{n}' \geq \beta e^{nh}}. A similar definition and argument gives {\mathcal{Z}_{n}'}.

Now we repeat the argument from the previous section (the second wrong argument) using {\mathcal{Y}',\mathcal{Z}'} in place of {\mathcal{Y},\mathcal{Z}}. Given {J\subset \{1,\dots, k\}}, let {\mathcal{D}_{kN}^J} be the image of the map {\pi_J} with the corresponding restricted domain, and note that the estimate (1) still holds (with the new value of {\beta}).

The final piece of the puzzle is to take {v \in \mathcal{D}_{kN}^J} and estimate how many other collections {\mathcal{D}_{kN}^{J'}} can contain it; that is, how many possibilities there are for {J} once we know {v}. We would like to do the following: write {v = w^1 u^1 w^2 u^2 \cdots w^\ell} and then argue that {J'} must be contained in the union of the sets of times corresponding to the {B(w^i)}. The problem is that this only works when there is a disagreement between which of the sets {\mathcal{Y}',\mathcal{Z}'} the maps {\pi_J,\pi_{J'}} are trying to use, and so I cannot quite make this give us the bounds we want.

The correct argument

To fix this final issue we change the construction slightly; instead of letting {J} mark the times where we transition between {\mathcal{Y}',\mathcal{Z}'}, we do a construction where at each {j\in J} we transition from {\mathcal{Y}'} to {\mathcal{Z}} and then immediately back to {\mathcal{Y}'}. Then {v = \mathcal{D}_{kN}^J \cap \mathcal{D}_{kN}^{J'}} will impose strong conditions on the set {J'}.

Let’s make everything precise and give the complete argument. As in the very beginning of this section we fix {\delta>0} and let {Y',Z'} be disjoint compact sets with {\mu(Y'),\nu(Z')>1-\delta}, where {\mu,\nu} are distinct ergodic MMEs. Let {N} be such that for every {n\geq N-\tau}, no {n}-cylinder intersects both {Y',Z'}.

Let {\mathcal{Y}_n = \{w\in \mathcal{L}_n \mid [w] \cap Y' \neq \emptyset\}}, and similarly for {\mathcal{Z}_n}. We repeat verbatim part of the argument from the last section; given {w\in \mathcal{Y}_{n}} for {n\geq N-\tau}, let

\displaystyle  B(w) = \{ 0 \leq j < \tfrac{n+\tau}N \mid w_{[jN, (j+1)N-\tau)}\in \mathcal{Z} \}

be the set of `bad’ times. Let {b(w) = \#B(w)}, and observe that since {\mu(Z') < \delta}, invariance of {\mu} gives

\displaystyle  \mu\{w\in \mathcal{Y}_{n} \mid j\in B(w)\} \leq \delta

for every {0\leq j < \frac{n+\tau}N}. We conclude that

\displaystyle  \sum_{w\in \mathcal{Y}_{n}} b(w)\mu(w) \leq \sum_{j=0}^{\lfloor\frac{n+\tau}N\rfloor-1} \mu\{w\in \mathcal{Y}_{n} \mid j\in B(w)\} \leq \delta \tfrac{n+\tau}N.

Let {\mathcal{Y}'_{n} = \{w\in \mathcal{Y}_{n} \mid b(w) \leq 2\delta \frac{n+\tau}N\}}, and note that

\displaystyle  \delta \tfrac{n+\tau}N \geq \sum_{w\in \mathcal{Y}_{n} \setminus \mathcal{Y}_{n}'} \mu(w) 2\delta \tfrac{n+\tau}N,


\displaystyle  \tfrac 12 \geq \mu(\mathcal{Y}_n) - \mu(\mathcal{Y}_n') \geq 1-\delta - \mu(\mathcal{Y}_n').

We conclude that

\displaystyle  \mu(\mathcal{Y}_{n}') \geq \tfrac 12 - \delta,

so taking {\beta = \beta(\frac 12 - \delta)}, the uniform Katok estimate gives {\#\mathcal{Y}_{n}' \geq \beta e^{nh}}.

Now given {k\in {\mathbb N}} and {J = \{j_1 < j_2< \cdots < j_\ell\} \subset \{1,\dots,k\}}, let {m_i = j_{i+1} - j_i} for {0\leq i\leq \ell} (putting {j_0=0} and {j_{\ell+1} = k}) and define a map

\displaystyle  \pi_J \colon \prod_{i=0}^{\ell+1} (\mathcal{Y}_{(m_i-1)N - \tau}' \times \mathcal{Z}_{N-\tau}) \rightarrow \mathcal{L}_{kN}

by the specification property, so that for {v^i \in \mathcal{Y}'_{(m_i-1)N-\tau}} and {w^i\in \mathcal{Z}_{N-\tau}} we have

\displaystyle  \pi_J(v^0,w^0,\dots,v^\ell,w^\ell) = v^0 u^0 w^0 \hat u^0 v^1 \cdots v^\ell u^\ell w^\ell,

where {u^i,\hat u^i} have length {\tau} and are provided by specification. Let {\mathcal{D}_{kN}^J} be the image of {\pi_J} and note that

\displaystyle  \#\mathcal{D}_{kN}^J \geq \beta^{2\ell+2} e^{(kN - (2\ell +1)\tau)h}. \ \ \ \ \ (2)

Given {J} and {x\in \mathcal{D}_{kN}^J}, let {\mathcal{J}(x,J)} denote the set of {J' \subset \{1,\dots, k\}} such that {x\in \mathcal{D}_{kN}^{J'}}. Writing {x = v^0 u^0 w^0 \hat u^0 \cdots v^\ell u^\ell w^\ell}, note that {x\in \mathcal{D}_{kN}^{J'}} implies that for each {j\in J'}, either {j\in J} or there are consecutive elements {j_i < j_{i+1}} of {J} such that {j_i < j < j_{i+1}}, and in this latter case we have that {v^i_{[(j - j_i)N, (j-j_i + 1)N-\tau)} \in \mathcal{Z}_{N-\tau}}, so {j-j_i \in B(v^i)}. We conclude that

\displaystyle  J' \subset J \cup \left(\bigcup_{i=0}^\ell j_i + B(v^i)\right).

By our choice of {\mathcal{Y}'}, for each {x} the set on the right has at most {\ell + 2\delta k} elements. In particular, we have

\displaystyle  \#\mathcal{J}(x,J) \leq 2^{\ell + 2\delta k}.

This bounds the number of {\mathcal{D}_{kN}^J} that can contain a given {x\in \mathcal{L}_{kN}}, and since there are {\geq e^{(-\gamma\log\gamma)k}} distinct choices of {J} with {\#J \leq \gamma k}, the bound in (2) gives

\displaystyle  \#\mathcal{L}_{kN} \geq 2^{-(\gamma + 2\delta)k} e^{(-\gamma\log\gamma)k}\beta^{2\gamma k+2} e^{(kN - (2\gamma k+1)\tau)h}.

Taking logs gives

\displaystyle  \log \#\mathcal{L}_{kN} \geq kNh -(\gamma + 2\delta)k\log 2 - (\gamma\log\gamma)k + (2\gamma k + 2)\log\beta - (2\gamma k + 1)\tau h.

Dividing by {kN} and sending {k\rightarrow\infty} gives

\displaystyle  h_\mathrm{top} \geq h + \tfrac 1N(-\gamma\log\gamma - (\gamma + 2\delta)\log 2 + 2\gamma\log\beta - 2\gamma\tau h).

Putting {\gamma = \delta} this gives

\displaystyle  h_\mathrm{top} \geq h + \tfrac \delta N(-\log\delta - \log 8 + 2\log\beta - 2\tau h).

Thus it suffices to make the appropriate choice of {\delta} at the beginning of the proof. More precisely, let {\beta' = \beta(\frac 14)} be as in the uniform Katok lemma, and let {\delta\in (0,\frac 14)} be small enough that {-\log\delta > \log 8 - 2\log\beta' + 2\tau h}. Then {\beta(\frac 12-\delta) \geq \beta'} and so the estimate above gives

\displaystyle  h_\mathrm{top} \geq h + \tfrac\delta N(-\log\delta - \log 8 + 2\log\beta' - 2\tau h) > h,

which contradicts our original assumption that {h} was the topological entropy. This contradiction shows that there is a unique measure of maximal entropy.

Posted in Uncategorized | 1 Comment

De Bruijn graphs and entropy at finite scales

Let {\mathcal{A}} be a finite set, which we call an alphabet, and let {x \in \mathcal{A}^{\mathbb N}} be an infinite sequence of letters from {A}. It is natural to ask how complex the sequence {x} is: for example, if the alphabet is {\{H,T\}}, then we expect a typical sequence produced by flipping a coin to be in some sense more complex than the sequence {x=HHHHH\dots}.

One important way of making this notion precise is the entropy of the shift space generated by {x}, a notion coming from symbolic dynamics. Let {a_k(x)} be the number of words of length {k} (that is, elements of {A^k}) that appear as subwords of {x}. Clearly we have {1\leq a_k(x) \leq (\#\mathcal{A})^k}. Roughly speaking, the entropy of {x} is the exponential growth rate of {a_k(x)}. More precisely, we write

\displaystyle  h(x) := \varlimsup_{k\rightarrow\infty} \tfrac 1k \log a_k(x).

Of course, in practice it is often the case that one does not have an infinite sequence, but merely a very long one. For example, it has been suggested that entropy (and a related quantity, the topological pressure) can play a role in the analysis of DNA sequences; see [D. Koslicki, “Topological entropy of DNA sequences”, Bioinformatics 27(8), 2011, p. 1061–1067] and [D. Koslicki and D.J. Thompson, “Coding sequence density estimation via topological pressure”, J. Math. Biol. 70, 2015, p. 45–69]. In this case we have {\mathcal{A} = \{A,C,G,T\}} and are dealing with sequences {x} whose length is large, but finite.

Given a sequence {x} with length {n}, one can try to get some reasonable `entropy-like’ quantity by fixing some {k} and putting {h(x) = \frac 1k\log a_k(x)}. But what should we take {k} to be? If we take {k} to be too small we will get an overestimate (with {k=1} we will probably just find out that {x} contains every letter of {\mathcal{A}}), but if we take {k} too small we get an underestimate (with {k=n} we have {a_k(x)=1} so {h=0}).

The convention proposed by Koslicki in the first paper above is to let {k} be the largest number such that there is some word of length {n} that contains every word of length {k}. If this actually happens, then {h(x)} achieves its maximum value {\log \#\mathcal{A}}; if some words do not appear, then {h(x)<\log \#\mathcal{A}}.

What is the relationship between {n} and {k} that guarantees existence of a word of length {n} containing every word of length {k}? Let {p=\#\mathcal{A}} and note that there are {p^k} words of length {k}; if {w\in \mathcal{A}^n} contains every such word then we must have {n\geq p^k + k-1}, since the length-{k} subwords of {w} are precisely {(w_1\cdots w_k)}, {(w_2\cdots w_k)}, \dots, {(w_{n-k+1} \cdots w_n)}, so we must have {n-k+1 \geq p^k}.

The converse implication is a little harder, though. Given {k}, let {n=p^k + k-1}. Is it necessarily true that there is a word {w\in \mathcal{A}^n} that contains every subword of length {k}? After all, once {w_1\cdots w_k} is determined, there are not many possibilities for the word {w_2\cdots w_{k+1}}; can we navigate these restrictions successfully?

It is useful to rephrase the problem in the language of graph theory (what follows can be found in the proof of Lemma 1 in Koslicki’s paper). Let {G_k} be the directed graph defined as follows:

  • the vertex set is {\mathcal{A}^k}, so each vertex corresponds to a word of length {k};
  • there is an edge from {u\in \mathcal{A}^k} to {v\in \mathcal{A}^k} if and only if {u_1 v = uv_k}, that is, if {u_2\cdots u_k = v_1\cdots v_{k-1}}.

The graph {G_k} is the {k}-dimensional De Bruijn graph of {p} symbols. Recall that a Hamiltonian path in a graph is a path that visits each vertex exactly once. Thus the question above, regarding existence of a word in {\mathcal{A}^n} that contains every word in {\mathcal{A}^k}, where {n=p^k + k-1}, is equivalent to asking for the existence of a Hamiltonian path in the De Bruijn graph.

There is a correspondence between {G_k} and {G_{k-1}}; vertices in {G_k} correspond to edges in {G_{k-1}} (since both are labeled by elements of {\mathcal{A}^k}). Thus a Hamiltonian path in {G_k} corresponds to an Eulerian path in {G_{k-1}}; that is, a path that visits every edge exactly once.

This correspondence is very useful, since in general the problem of determining whether a Hamiltonian path exists is hard (NP-complete), while it is easy to check existence of an Eulerian path in a directed graph: a sufficient condition is that every vertex have in-degree equal to its out-degree (and a slight weakening of this condition is both necessary and sufficient). This is the case for De Bruijn graphs, where every vertex has {p=\#\mathcal{A}} edges coming in and going out. Thus {G_{k-1}} has an Eulerian path, which corresponds to a Hamiltonian path for {G_k}. This answers the original question, demonstrating that for every {k}, there is a word {w} of length {n=p^k+k-1} such that {w} contains every word of length {k} as a subword.

Posted in Uncategorized | 1 Comment