On Unbiased MCMC with couplings

Pierre Jacob, John O’Leary and Yves Atchadé’s excellent paper on Unbiased MCMC with couplings will be read at the Royal Statistical Society tomorrow; Pierre has already presented the paper on the Statisfaction blog.
Although we won’t be present tomorrow, we have read it at length in our local reading group with Xian Robert and PhD students Grégoire Clarté, Adrien Hairault and Caroline Lawless, and have submitted the following discussion.

We congratulate the authors for this excellent paper.

In “traditional” MCMC, it is standard to check that stationarity has been attained by running a small number of parallel chains, initiated at different starting points, to verify that the final distribution is independent of the initialization — even though the single versus multiple chain(s) debate errupted from the start with Gelman and Rubin (1992) versus Geyer (1992).

As noted by the authors, a bad choice of the initial distribution p_0 can lead to poor properties. In essence, this occurs and remains undetected for the current proposal because the coupling of the chains occurs long before the chain reaches stationarity. We would like to make two suggestions to alleviate this issue, and hence add a stationarity check as a byproduct of the run.

  1. The chains X and Y need to have the same initial distribution, but different pairs of chains on different parallel cores can afford different initial distributions. The resulting estimator remains unbiased. We would therefore suggest that parallel chains be initiated from distributions which put weight on different parts of the parameter space. Ideas from the Quasi-Monte Carlo literature (see Gerber & Chopin 2015) could be used here.
  2.  We also note that although the marginal distributions of X and Y need to be identical, any joint distribution on (X,Y) produces an unbiased algorithm. We would suggest that it is preferable that X and Y meet (shortly) after the chains have reached stationarity. Here is one possible strategy to this end: let p and p' be two distributions which put weight on different parts of the space, and Z\sim Bernoulli(1/2). If Z=0, take X_0\sim p and Y_0\sim p', else take X_0\sim p' and Y_0\sim p. The marginal distribution of both X_0 and Y_0 is \frac12(p+p'), but the two chains will start in different parts of the parameter space and are likely to meet after they have both reached stationarity.

The ideal algorithm is one which gives a correct answer when it has converged, and a warning or error when it hasn’t. MCMC chains which have not yet reached stationarity (for example because they have not found all modes of a multimodal distribution) can be hard to detect. Here, this issue is more likely to be detected since it would lead to the coupling not occuring: \mathbb E[\tau] is large, and this is a feature, since it warns the practitioner that their kernel is ill-fitted to the target density.