## Entropy bounds for equilibrium states

[Update 6/15/17: The original version of this post had a small error in it, which has been corrected in the present version; the definition of ${\mathcal{I}_n}$ in the proof of the main theorem needed to be modified so that each ${k_i}$ is a multiple of ${2(\tau+1)}$.  Thanks to Leonard Carapezza for pointing this out to me.]

Let ${X}$ be a compact metric space and ${f\colon X\rightarrow X}$ a homeomorphism. Recall that an equilibrium state for a continuous potential function ${\varphi\colon X\rightarrow {\mathbb R}}$ is an ${f}$-invariant Borel probability measure on ${X}$ maximizing the quantity ${h_\mu(f) + \int\varphi\,d\mu}$ over all invariant probabilities; the topological pressure ${P(\varphi)}$ is the value of this maximum.

A classical result on existence and uniqueness of equilibrium states is due to Bowen, who proved that if ${f}$ is expansive and has specification, and ${\varphi}$ has a bounded distortion property (the `Bowen property’), then there is a unique equilibrium state ${\mu_\varphi}$. In particular, this applies when ${f}$ is Anosov and ${\varphi}$ is Hölder.

It seems to be well-known among experts that under Bowen’s hypotheses, ${\mu_\varphi}$ must have positive entropy (equivalently, ${P(\varphi) > \sup_\mu \int\varphi\,d\mu}$), but I do not know of an explicit reference. In this post I provide a proof of this fact, which also gives reasonably concrete bounds on the entropy of ${\mu_\varphi}$; equivalently, a bound on the size of the gap ${P(\varphi) - \sup_\mu \int\varphi\,d\mu}$.

1. Definitions and result

First, let’s recall the definitions in the form that I will need them. Given ${x\in X}$, ${n\in {\mathbb N}}$, and ${\varepsilon>0}$, the Bowen ball around ${x}$ of order ${n}$ and radius ${\varepsilon}$ is the set

$\displaystyle B_n(x,\varepsilon) := \{y\in X : d(f^kx, f^ky) < \varepsilon \text{ for all } 0\leq k < n\}.$

The map ${f}$ has specification if for every ${\varepsilon>0}$ there is ${\tau=\tau(\varepsilon)\in {\mathbb N}}$ such that for every ${x_1,\dots, x_k\in X}$ and ${n_1,\dots, n_k\in {\mathbb N}}$, there is ${x\in X}$ such that

$\displaystyle x\in B_{n_1}(x_1,\varepsilon),\qquad f^{n_1 + \tau}(x)\in B_{n_2}(x_2,\varepsilon),$

and in general

$\displaystyle f^{n_1 + \tau + \cdots + n_{i-1} + \tau}(x) \in B_{n_i}(x_i,\varepsilon)$

for every ${1\leq k\leq n}$. We refer to ${\tau}$ as the “gluing time”; one could also consider a weaker property where the gluing times are allowed to vary but must be bounded above by ${\tau}$; this makes the estimates below more complicated, so for simplicity we will stick with the stronger version.

A function ${\varphi\colon X\rightarrow {\mathbb R}}$ has the Bowen property at scale ${\varepsilon}$ with distortion constant ${V}$ if ${V\in {\mathbb R}}$ is such that

$\displaystyle |S_n\varphi(x) - S_n\varphi(y)| \leq V \text{ for all } x\in X\text{ and } y\in B_n(x,\varepsilon),$

where ${S_n\varphi(x) := \sum_{k=0}^{n-1} \varphi(f^k x)}$. We write

$\displaystyle \Lambda_n(\varphi,\varepsilon) := \sup_{E\in \mathcal{E}_{n,\varepsilon}} \sum_{x\in E} e^{S_n\varphi(x)},$

where ${\mathcal{E}_{n,\varepsilon}}$ is the collection of ${(n,\varepsilon)}$-separated subsets of ${X}$ (those sets ${E\subset X}$ for which ${y\notin B_n(x,\varepsilon)}$ whenever ${x,y\in E}$, ${x\neq y}$). The topological pressure is ${P(\varphi) = \lim_{\varepsilon\rightarrow 0} P(\varphi,\varepsilon)}$, where

$\displaystyle P(\varphi,\varepsilon) = \limsup_{n\rightarrow\infty} \frac 1n \log \Lambda_n(\varphi,\varepsilon).$

Theorem 1 Let ${X}$ be a compact metric space with diameter ${>6\varepsilon}$, ${f\colon X\rightarrow X}$ a homeomorphism with specification at scale ${\varepsilon}$ with gap size ${\tau}$, and ${\varphi\colon X\rightarrow {\mathbb R}}$ a potential with the Bowen property at scale ${\varepsilon}$ with distortion constant ${V}$. Let

$\displaystyle \Delta = \frac{\log(1+e^{-(V+2(2\tau+1)\|\varphi\|)})}{2(\tau+1)}$

where ${\|\varphi\| = \sup_{x\in X} |\varphi(x)|}$. Then we have

$\displaystyle P(\varphi) \geq P(\varphi,\varepsilon) \geq \Big( \sup_\mu \int\varphi\,d\mu\Big) + \Delta. \ \ \ \ \ (1)$

In particular, if ${\mu}$ is an equilibrium state for ${\varphi}$, then we have ${h_\mu(f) \geq \Delta > 0}$.

2. Consequence for Anosov diffeomorphisms

Before proving the theorem we point out a useful corollary. If ${M}$ is a compact manifold and ${f\colon M\rightarrow M}$ is a topologically mixing ${C^1}$ Anosov diffeomorphism, then ${f}$ has specification at every scale (similar results apply in the Axiom A case). Moreover, every Hölder continuous potential has the Bowen property, and thus Theorem 1 applies.

For an Anosov diffeo, the constants ${V}$ and ${\tau}$ in (1) can be controlled by the following factors (here we fix a small ${\varepsilon>0}$):

1. the rate of expansion and contraction along the stable and unstable directions, given in terms of ${C,\lambda>0}$ such that ${\|Df^n_x(v^s)\| \leq C e^{-\lambda n}}$ for all ${n\geq 0}$ and ${v^s\in E^s}$, and similarly for ${v^u\in E^u}$ and ${n\leq 0}$;
2. how quickly unstable manifolds become dense, in other words, the value of ${R>0}$ such that ${W_R^u(x)}$ is ${\varepsilon}$-dense for every choice of ${x}$;
3. the angle between stable and unstable directions, which controls the local product structure, in particular via a constant ${K>0}$ such that ${d(x,y) < \varepsilon}$ implies that ${W_{K\varepsilon}^s(x)}$ intersects ${W_{K\varepsilon}^u(y)}$ in a unique point ${z}$, and the leafwise distances from ${x,y}$ to ${z}$ are at most ${K d(x,y)}$;
4. the Hölder exponent (${\beta}$) and constant (${|\varphi|_\beta}$) for the potential ${\varphi}$.

For the specification property for an Anosov diffeo, ${\tau =\tau(\varepsilon)}$ is determined by the condition that ${C^{-1}e^{\lambda\tau}(\varepsilon/K) > R}$, so that small pieces of unstable manifold expand to become ${\varepsilon}$-dense within ${\tau}$ iterates; thus we have

$\displaystyle \tau(\varepsilon) \approx \lambda^{-1} \log(R(\varepsilon) KC\varepsilon^{-1}).$

For the Bowen property, one compares ${S_n\varphi(x)}$ and ${S_n\varphi(y)}$ by comparing each to ${S_n\varphi(z)}$, where ${z}$ is the (Smale bracket) intersection point coming from the local product structure. Standard estimates give ${d(f^j x, f^jz) \leq CK\varepsilon e^{-\lambda j}}$, so the Hölder property gives

\displaystyle \begin{aligned} |S_n\varphi(x) - S_n\varphi(z)| &\leq \sum_{j=0}^{n-1} |\varphi(f^j x) - \varphi(f^j z)| \leq \sum_{j=0}^{n-1} |\varphi|_\beta d(f^jx,f^jz)^\beta \\ &\leq |\varphi|_\beta \sum_{j=0}^\infty (CK\varepsilon)^\beta e^{-\lambda\beta j} = |\varphi|_\beta (CK\varepsilon)^\beta (1-e^{-\lambda\beta})^{-1}. \end{aligned}

A similar estimate for ${|S_n\varphi(y) - S_n\varphi(z)|}$ gives

$\displaystyle V = 2(CK\varepsilon)^\beta(1- e^{-\lambda\beta})^{-1} |\varphi|_\beta.$

Thus Theorem 1 has the following consequence for Anosov diffeomorphisms.

Corollary 2 Let ${f}$ be a topologically mixing Anosov diffeomorphism on ${M}$ and ${C,\lambda,\varepsilon,R,K}$ the quantities above. Let

$\displaystyle \delta = \frac{\lambda}{2\log(RKC\varepsilon^{-1})}.$

Given a ${\beta}$-Hölder potential ${\varphi\colon M\rightarrow {\mathbb R}}$, consider the quantity

$\displaystyle Q(\varphi) := 2(CK\varepsilon)^\beta (1-e^{-\lambda\beta})^{-1} |\varphi|_\beta + 5\lambda^{-1} \log(RKC\varepsilon^{-1})\|\varphi\|.$

Then we have

$\displaystyle P(\varphi) \geq P(\varphi,\varepsilon) \geq \Big(\sup_\mu \int\varphi\,d\mu\Big) + \delta \log(1+e^{-Q(\varphi)})$

so that in particular, if ${\mu}$ is an equilibrium state for ${\varphi}$, then

$\displaystyle h_\mu(f) \geq \delta \log(1+e^{-Q(\varphi)}) > 0.$

Finally, note that since shifting the value of ${\varphi}$ by a constant does not change its equilibrium states, we can assume without loss of generality that ${\|\varphi\| \leq (\mathrm{diam}\, M)^\beta |\varphi|_\beta}$ and write the following consequence of the above, which is somewhat simpler in appearance.

Corollary 3 Let ${M}$ be a compact manifold and ${f\colon M\rightarrow M}$ a topologically mixing Anosov diffeomorphism. For every ${\beta>0}$ there are constants ${\delta = \delta(M,f)>0}$ and ${R = R(M,f,\beta)}$ such that for every ${\beta}$-Hölder potential ${\varphi}$, we have

$\displaystyle P(\varphi) \geq \Big(\sup_\mu \int\varphi\,d\mu\Big) + \delta e^{-R|\varphi|_\beta}$

so that as before, if ${\mu}$ is an equilibrium state for ${\varphi}$, we have

$\displaystyle h_\mu(f) \geq \delta e^{-R|\varphi|_\beta} > 0.$

This corollary gives a precise bound on how the entropy of a family of equilibrium states can decay as the Hölder semi-norms ${|\varphi|_\beta}$ of the corresponding potentials become large. To put it another way, given any threshold ${h_0>0}$, this gives an estimate on how large ${|\varphi|_\beta}$ must be before ${\varphi}$ can have an equilibrium state with entropy below ${h_0}$.

3. Proof of the theorem

We spend the rest of the post proving Theorem 1. Fix ${x\in X}$ and consider for each ${n\in {\mathbb N}}$ the orbit segment ${x, f(x), \dots, f^{n-1}(x)}$. Fix ${\alpha\in (0,\frac 12]}$. Let ${m_n = \lceil \frac{\alpha n}{2(\tau+1)} \rceil}$, and let

$\displaystyle \mathcal{I}_n = \{ 0 < k_1 < k_2 < \cdots < k_{m_n} < n : k_i \in 2(\tau+1){\mathbb N} \ \forall i\}.$

Write ${k_0 = 0}$ and ${k_{m_n + 1} = n}$. The idea is that for each ${\vec k\in \mathcal{I}_n}$, we will use the specification property to construct a point ${\pi(\vec k) \in X}$ whose orbit shadows the orbit of ${x}$ from time ${0}$ to time ${n}$, except for the times ${k_i}$, at which it deviates briefly; thus the points ${\pi(\vec k)}$ will be ${(n,\varepsilon)}$-separated on the one hand, and on the other hand will have ergodic averages close to that of ${x}$.

First we estimate ${\#\mathcal{I}_n}$ from below; this requires a lower bound on ${{n\choose \ell}}$. Integrating ${\log t}$ over ${[1,k]}$ and ${[1,k+1]}$ gives

$\displaystyle k\log k - k + 1 \leq \log(k!) \leq k\log k - k + 1 + \log(k+1),$

and thus we have

\displaystyle \begin{aligned} \log{n\choose \ell} &= \log(n!) - \log(\ell!) - \log(n-\ell)! \\ &\geq n\log n + 1 - \ell\log\ell - (n-\ell)\log(n-\ell) - \log((\ell+1)(n-\ell+1)) \\ &\geq h\big( \tfrac\ell n\big) n - 2\log n, \end{aligned}

where ${h(\delta) = -\delta\log\delta - (1-\delta)\log(1-\delta)}$. This function is increasing on ${(0,\frac12)}$, so

\displaystyle \begin{aligned} \log\#\mathcal{I}_n &\geq \log{\lfloor \frac{n}{2(\tau+1)}\rfloor \choose m_n} \geq h(\tfrac {2(\tau+1) m_n} n) \frac{n}{2(\tau+1)} - 2\log \frac{n}{2(\tau+1)} \\ &\geq \frac{h(\alpha)}{2(\tau+1)} n - 2\log n. \end{aligned} \ \ \ \ \ (2)

Given ${k\in \{0, \dots, n-1\}}$, let ${y_k \in X}$ be any point with ${d(f^k(x),y_k) > 3\varepsilon}$ (using the assumption on the diameter of ${X}$). Now for every ${\vec{k}\in \mathcal{I}_n}$, the specification property guarantees the existence of a point ${\pi(\vec{k})\in X}$ with the property that

\displaystyle \begin{aligned} \pi(\vec{k}) &\in B_{k_1-\tau}(x,\varepsilon), \\ \qquad f^{k_1}(\pi(\vec{k})) &\in B(y_{k_1},\varepsilon), \\ \qquad f^{k_1+\tau+1}(\pi(\vec{k})) &\in B_{k_2 - k_1 - 2\tau - 1}(f^{k_1 + \tau+1}(x)), \end{aligned}

and so on, so that in general for any ${0\leq i \leq m_n}$ we have

\displaystyle \begin{aligned} f^{k_i + \tau + 1}(\pi(\vec{k})) &\in B_{k_{i+1} - k_i - 2\tau - 1}(f^{k_i + \tau _ 1}(x)), \\ f^{k_{i+1}}(\pi(\vec{k})) &\in B(y_{k_{i+1}},\varepsilon). \end{aligned} \ \ \ \ \ (3)

Write ${j_i = k_{i+1} - k_i - 2\tau - 1}$; then the first inclusion in (3), together with the Bowen property, gives

$\displaystyle |S_{j_i} \varphi(f^{k_i + \tau + 1}(x)) - S_{j_i} \varphi(f^{k_i + \tau + 1} (\pi (\vec{k}))| \leq V.$

Now observe that for any ${y\in X}$ we have

$\displaystyle \bigg| S_n \varphi(y) - \sum_{i=0}^{m_n} S_{k_{i+1} - k_i - 2\tau-1}\varphi(f^{k_i + \tau + 1} y) \bigg| \leq (2\tau + 1)m_n \|\varphi\|.$

We conclude that

$\displaystyle |S_n\varphi(\pi(\vec{k})) - S_n\varphi(x)| \leq m_n(V + 2(2\tau+1)\|\varphi\|). \ \ \ \ \ (4)$

Consider the set ${\pi(\mathcal{I}_n) \subset X}$. The second inclusion in (3) guarantees that this set is ${(n,\varepsilon)}$-separated; indeed, given any ${\vec{k} \neq \vec{k}' \in \mathcal{I}_n}$, we can take ${i}$ to be minimal such that ${k_i \neq k_i'}$, let ${j=k_i'}$, and then observe that ${f^j(\pi(\vec{k})) \in B(y_j, \varepsilon)}$ and ${f^j(\pi(\vec{k}')) \in B(f^j(x),\varepsilon)}$; since ${d(y_j,f^j(x)) > 3\varepsilon}$ this guarantees that ${\pi(\vec{k}') \notin B_n(\pi(\vec{k}),\varepsilon)}$.

Using this fact and the bounds in (4) and (2), we conclude that

\displaystyle \begin{aligned} \Lambda_n(\phi,\varepsilon) &\geq \sum_{\vec{k} \in \pi(\mathcal{I}_n)} e^{S_n\varphi(\pi(\vec{k}))} \\ &\geq (\#\mathcal{I}_n) \exp\big(S_n\varphi(x) - m_n(V+2(2\tau+1)\|\varphi\|)\big) \\ &\geq n^{-2} \exp\big(S_n \varphi(x) + \tfrac {h(\alpha)}{2(\tau+1)} n - (\tfrac{\alpha}{2(\tau+1)} n + 1)(V+2(2\tau+1)\|\varphi\|)\big). \end{aligned}

Taking logs, dividing by ${n}$, and sending ${n\rightarrow\infty}$ gives

$\displaystyle P(\varphi,\varepsilon) \geq \Big(\limsup_{n\rightarrow\infty} \frac 1n S_n\varphi(x) \Big) + \frac 1{2(\tau+1)} \Big(h(\alpha) - \alpha(V+2(2\tau+1)\|\varphi\|) \Big).$

Given any ergodic ${\mu}$, we can take a generic point ${x}$ for ${\mu}$ and conclude that the lim sup in the above expression is equal to ${\int\varphi\,d\mu}$. Thus to bound the difference ${P(\varphi,\varepsilon) - \int\varphi\,d\mu}$, we want to choose the value of ${\alpha \in (0,\frac 12]}$ that maximizes ${h(\alpha) - \alpha Q}$, where ${Q=V+2(2\tau+1)\|\varphi\|}$.

A straightforward differentiation and some routine algebra shows that ${\frac d{d\alpha} (h(\alpha) - \alpha Q) = 0}$ occurs when ${\alpha = (1+e^Q)^{-1}}$, at which point we have ${h(\alpha) - \alpha Q = \log(1+e^{-Q})}$, proving Theorem 1.