Accepted: Wang-Landau Flat Histogram

Excellent news to start the week-end: our paper with Pierre Jacob The Wang-Landau Algorithm Reaches the Flat Histogram in Finite Time has been accepted for publication in Annals of Applied Probability.

For details, see this blog post or read the paper on arXiv.

The Wang-Landau algorithm reaches the flat histogram in finite time.

MCMC practitioners may be familiar with the Wang-Landau algorithm, which is widely used in Physics. This algorithm divides the sample space into “boxes”. Given a target distribution, the algorithm then samples proportionally to the target in each box, while aiming at spending a pre-defined proportion of the sample in each box. (Usually these predefined proportions are uniform.)

This strategy can help move faster between modes of a distribution, by forcing the sample to visit often the space between modes.

The most sophisticated versions of this algorithm combine a decreasing stochastic schedule and the so-called flat histogram criterion: whenever the proportions of the sample in each box are close enough to the desired frequencies, the stochastic schedule decreases. A decreasing schedule is necessary for diminishing adaptation to hold.

Until now, it was unknown whether the flat histogram is necessarily reached in finite time, and hence whether the schedule ever starts decreasing.

Pierre Jacob and I just submitted and arXived a proof that the flat histogram is reached in finite time under some conditions, and may never be reached in other cases.

Pour la Science article on Indo-European expansion

Phylogenetic models of language diversification seem to be popular these days in French popular science magazines. Of the leading publications, La Recherche will feature an interview with yours truly in March, and Pour la Science has an 8-page cover story on the subject in the current issue.

Popularizer Ruth Berger looks at the expansion of the Indo-Europeans from genetic and linguistic points of view, trying to reconcile them and to decide between the Kurgan (horsemen) and Anatolian (farmers) possible origins of Indo-European expansion. For the linguistics half, she looks at phylogenetic models to infer genealogies and dates, but skips the methodology and reproduces directly trees by Gray & Atkinson (2003) and Atkinson et al. (2005).

It is a shame that the method is presented as a black box. Given the length of the article, it would have been possible to give a general idea of how dates are inferred: the ages of parts of the tree are known, and this information is used to estimate rates of change and other ages. Instead, the author suggests that the rates are already known [whence?] and are fed to the black box, which magically outputs a tree and dates.  There is barely anything about the uncertainty of the estimates, and nothing about validation. I also have trouble understanding the points made at the end about Linear A and the attempt to merge the Anatolian and Kurgan hypotheses.

This issue is number 400 of Pour la Science. According to the editor-in-chief, they decided to celebrate with an issue on “theories and models”. Indo-European expansion is one of their examples, along with pieces on Grand Unification and on gene transfers. Uncertainty and validation are major parts of any decent modelling endeavour, and it is a shame that they did not seize the opportunity to educate their readership about these issues.

I suppose it is hardly surprising that I am disappointed with a popular science paper on a topic related to my PhD…

Data on shared bicycles in Lyon

Pablo Jensen et al. recently posted on arXiv a short analysis of a fantastic data set: 11 million trips made with the shared bicycles Vélo’v in Lyon. For every trip, the start station, final station, and trip time, duration and distance are available.

Among other things, they show that the average Vélo’v is faster than the average car at peak time, that cyclists are faster on Wednesdays and slower during the week-ends, and that winter speeds are higher (presumably because casual – hence slower – cyclists only cycle during the summer). My guess would be that cyclists who ride their own bikes, rather than use Vélo’v, are even faster, since they are probably more used to cycling and definitely have better and lighter bikes.

Given the start and final point of a trip, they can also calculate the length of the shortest path which obeys all one-way streets, and they show that a whopping 61% of cyclists take a shortcut, which must involve cycling the wrong way or on pavements. (It goes without saying that Parisian cyclists would never do such a thing.)

This is a fantastic data set, and I cannot wait to see more analyses, or some visualizations à la Pedro M. Cruz.

Obama’s skin seems darker to his opponents and lighter to his proponents

This is a couple of months old, but still worth reposting:

In three studies, participants rated the representativeness of photographs of a hypothetical (Study 1) or real (Barack Obama; Studies 2 and 3) biracial political candidate. Unbeknownst to participants, some of the photographs had been altered to make the candidate’s skin tone either lighter or darker than it was in the original photograph. Participants whose partisanship matched that of the candidate they were evaluating consistently rated the lightened photographs as more representative of the candidate than the darkened photographs, whereas participants whose partisanship did not match that of the candidate showed the opposite pattern.