In the last post, we introduced basic properties of convex cones and the Hilbert metric. In this post, we look at how these tools can be used to obtain an explicit estimate on the rate of convergence in the Perron–Frobenius theorem.
1. Perron–Frobenius theorem
We start by stating a version of the Perron–Frobenius theorem. Let be a stochastic matrix, where here we use this to mean that the entries of are non-negative, and every column sums to 1: for all , and for all . Thus the columns of are probability vectors.
Such a matrix describes a weighted random walk on sites: if the walker is presently at site , then gives the probability that he will move to site at the next step. Thus if we interpret a probability vector as giving the probability of the walker being at site with probability , then gives the evolution of this probability under one step of the random walk.
Now one version of the Perron–Frobenius theorem is as follows: If is a stochastic matrix with (that is, for all ), then there is exactly one probability vector that is an eigenvector for . Moreover, the eigenvalue associated to this eigenvector is 1, the eigenvalue 1 is simple, and all other eigenvalues have modulus . In particular, given any we have exponentially quickly.
The eigenvector is the stationary distribution for the random walk (Markov chain) given by , and the convergence result states that any initial distribution converges to the stationary distribution under iteration of the process.
The assumption that is quite strong: for the random walk, this says that the walker can get from any site to any other site in a single step. A more general condition is that is primitive: that is, there exists such that . This says that there is a time such that by taking steps, the walker can get from any site to any other site. The same result as above holds in this case too.
In fact, the result holds in the even more general case when is irreducible: for every there exists such that . This says that the walker can get from every site to every other site, but removes the assumption that there is a single time that works for all site. For example, consider a random walk on a chessboard, where the walker is allowed to move one square horizontally or vertically at each step. Then for a sufficiently large even value of , the walker can get from any white square to any other white square, but to get to a black square requires an odd value of .
2. Rate of decay using convex cones
As stated above, the Perron–Frobenius theorem does not give any result on the rate with which converges to . One way to give an estimate on this rate is to use convex cones and the Hilbert metric, which were discussed in the last post.
2.1. A cone and a metric
Let be the convex cone . We want an estimate on the diameter of in the Hilbert metric . Recall that this metric is given by , where
Another way of interpreting the cone is in terms of the partial order it places on , which is given by . We see that and can be characterised as
and similarly .
2.2. Diameter of
Now we need to determine the diameter of in the Hilbert metric . If , then the theorem of Birkhoff from the previous post will imply that contracts distances by a factor of .
Let be the standard basis vectors in . Because is projective we can compute by considering where . Using the triangle inequality, we have
2.3. Contraction under multiplication by
Now we have a very concrete procedure for estimating the amount of contraction in the metric under multiplication by :
- estimate using (2) and the expression for in (1) and the discussion preceding it;
- get a contraction rate of .
From (1) and the discussion preceding it, the distance is given as
This allows us to obtain estimates on . However, we want to estimate in a more familiar metric, such as one coming from a norm. We can relate the two by observing that if , then
3. Nonnegative matrices
The analysis in the previous section required to be positive ( for all ). A more general condition is that is nonnegative and primitive: that is, for all , and moreover there exists such that .
If for some , then it is easy to see from the calculations in the previous section that has infinite diameter in the Hilbert metric, so the above arguments do not apply directly. However, they do apply to when , and so we fix for which this is true, and we obtain such that for all .
Moreover, let be such that for all . Then for any we can write for some , so that
where is as in (6). Thus we conclude that asymptotically, approaches the eigenvector with contraction rate .
To see this in action, consider a Markov chain with transition matrix
That is, from the first state the walker transitions to either state with probability 1/2, while from the second state the walker always returns to the first state. Since the transition from the second state to itself is forbidden, has infinite diameter. However, the two-step transition matrix is
for which we can compute
Thus the estimate on gives us a definite rate of contraction, which the estimate from does not.
It can be useful to use the estimate on even when . For example, if we consider the Markov chain with transition matrix
then we have
as the rate of contraction, while considering
a better estimate than we obtained from considering itself.