In the last post, we introduced basic properties of convex cones and the Hilbert metric. In this post, we look at how these tools can be used to obtain an explicit estimate on the rate of convergence in the Perron–Frobenius theorem.

**1. Perron–Frobenius theorem **

We start by stating a version of the Perron–Frobenius theorem. Let be a stochastic matrix, where here we use this to mean that the entries of are non-negative, and every column sums to 1: for all , and for all . Thus the columns of are probability vectors.

Such a matrix describes a weighted random walk on sites: if the walker is presently at site , then gives the probability that he will move to site at the next step. Thus if we interpret a probability vector as giving the probability of the walker being at site with probability , then gives the evolution of this probability under one step of the random walk.

Now one version of the Perron–Frobenius theorem is as follows: If is a stochastic matrix with (that is, for all ), then there is exactly one probability vector that is an eigenvector for . Moreover, the eigenvalue associated to this eigenvector is 1, the eigenvalue 1 is simple, and all other eigenvalues have modulus . In particular, given any we have exponentially quickly.

The eigenvector is the stationary distribution for the random walk (Markov chain) given by , and the convergence result states that any initial distribution converges to the stationary distribution under iteration of the process.

The assumption that is quite strong: for the random walk, this says that the walker can get from any site to any other site in a single step. A more general condition is that is *primitive*: that is, there exists such that . This says that there is a time such that by taking steps, the walker can get from any site to any other site. The same result as above holds in this case too.

In fact, the result holds in the even more general case when is *irreducible*: for every there exists such that . This says that the walker can get from every site to every other site, but removes the assumption that there is a single time that works for all site. For example, consider a random walk on a chessboard, where the walker is allowed to move one square horizontally or vertically at each step. Then for a sufficiently large even value of , the walker can get from any white square to any other white square, but to get to a black square requires an odd value of .

**2. Rate of decay using convex cones **

As stated above, the Perron–Frobenius theorem does not give any result on the rate with which converges to . One way to give an estimate on this rate is to use convex cones and the Hilbert metric, which were discussed in the last post.

** 2.1. A cone and a metric **

Let be the convex cone . We want an estimate on the diameter of in the Hilbert metric . Recall that this metric is given by , where

Another way of interpreting the cone is in terms of the partial order it places on , which is given by . We see that and can be characterised as

In our present example, we see that the cone induces the partial order . Thus

** 2.2. Diameter of **

Now we need to determine the diameter of in the Hilbert metric . If , then the theorem of Birkhoff from the previous post will imply that contracts distances by a factor of .

Let be the standard basis vectors in . Because is projective we can compute by considering where . Using the triangle inequality, we have

so it suffices to consider for . But is just the th column of the matrix , so writing , where is the th column vector, we see that

** 2.3. Contraction under multiplication by **

Now we have a very concrete procedure for estimating the amount of contraction in the metric under multiplication by :

- estimate using (2) and the expression for in (1) and the discussion preceding it;
- get a contraction rate of .

From (1) and the discussion preceding it, the distance is given as

Let . To write an explicit estimate for , we use

where is any estimate we can obtain satisfying . From (3) and (2), we have

This allows us to obtain estimates on . However, we want to estimate in a more familiar metric, such as one coming from a norm. We can relate the two by observing that if , then

where the last inequality uses the fact that has derivative on . Since maps the unit simplex to itself (because is stochastic), we see that

where is given by (4) and (5), and where we can take either or (since ), whichever gives the better bound. Since all norms on are equivalent, we have a similar bound in any norm.

**3. Nonnegative matrices **

The analysis in the previous section required to be positive ( for all ). A more general condition is that is nonnegative and primitive: that is, for all , and moreover there exists such that .

If for some , then it is easy to see from the calculations in the previous section that has infinite diameter in the Hilbert metric, so the above arguments do not apply directly. However, they do apply to when , and so we fix for which this is true, and we obtain such that for all .

Moreover, let be such that for all . Then for any we can write for some , so that

where is as in (6). Thus we conclude that asymptotically, approaches the eigenvector with contraction rate .

To see this in action, consider a Markov chain with transition matrix

That is, from the first state the walker transitions to either state with probability 1/2, while from the second state the walker always returns to the first state. Since the transition from the second state to itself is forbidden, has infinite diameter. However, the two-step transition matrix is

for which we can compute

Thus the estimate on gives us a definite rate of contraction, which the estimate from does not.

It can be useful to use the estimate on even when . For example, if we consider the Markov chain with transition matrix

then we have

as the rate of contraction, while considering

gives

a better estimate than we obtained from considering itself.