The Wang-Landau algorithm reaches the flat histogram in finite time.

19/10/2011

MCMC practitioners may be familiar with the Wang-Landau algorithm, which is widely used in Physics. This algorithm divides the sample space into “boxes”. Given a target distribution, the algorithm then samples proportionally to the target in each box, while aiming at spending a pre-defined proportion of the sample in each box. (Usually these predefined proportions are uniform.)

This strategy can help move faster between modes of a distribution, by forcing the sample to visit often the space between modes.

The most sophisticated versions of this algorithm combine a decreasing stochastic schedule and the so-called flat histogram criterion: whenever the proportions of the sample in each box are close enough to the desired frequencies, the stochastic schedule decreases. A decreasing schedule is necessary for diminishing adaptation to hold.

Until now, it was unknown whether the flat histogram is necessarily reached in finite time, and hence whether the schedule ever starts decreasing.

Pierre Jacob and I just submitted and arXived a proof that the flat histogram is reached in finite time under some conditions, and may never be reached in other cases.

New position

29/09/2011

As of this month, I am no longer a postdoc, but an assistant professor (maître de conférences) at Université Paris Dauphine.

My new contact details are:

Robin Ryder
Bureau C608
CEREMADE − Université Paris Dauphine
Place du Maréchal de Lattre de Tassigny
75116 Paris
Phone: (+33) 1 44 05 46 71

My e-mail is unchanged.

Greek Stochastics γ

08/06/2011

The Greek Stochastics γ conference was held last week in Crete and focused this year on MCMC methods (last year was of statistics for biology). The schedule allowed little time for presentations from participants, but there were several “short courses”. In particular, Gareth Roberts gave a rather illuminating explanation on convergence of Gibbs samplers, in which each step of the sampler is viewed as a projection in a functional space.

In a Gibbs sampler with two steps (1. update X given \theta; 2. update \theta given X), step 1 is a projection (recall that P is a projection iff P^2=P, and it is clear than performing step 1 twice in a row is the same as performing it only once); the same is true of step 2. The spaces we are projecting on are ugly, but if you view them as lines, it is easy to see that the successive iterations converge to the intersection of the two lines, which corresponds to a fixed point, as shown in this very ugly figure. The convergence is faster if the two lines are  close to orthogonal, which corresponds to correlation close to 0 in the Gibbs sampler. Apparently, this line of thought was initiated by Yael Amit. I find it very helpful.

Another fascinating talk was given by David Spiegelhalter (the “general interest” talk), on communicating in Statistics. David studies the question “Why do people find probability unintuitive and difficult?”; he suggests that the answer in “because probability in unintuitive and difficult”. He gave a startling example: in a phone survey, if you ask which is the greater risk between 1/100, 1/10 and 1/1000, about 25-30% of people give the wrong answer. Media reports often use such notation, with the numerator fixed to 1. The message would be better understood if probabilities were given with a fixed denominator (1/100, 10/100, 0.1/100). David had many other suggestions on explaining risk and uncertainty to non-statisticians, all very useful.

Savage award honourable mention!

01/04/2011

The results for the 2010 Savage award, bestowed by the International Society for Bayesian Analysis for PhD theses in Bayesian statistics, just came in. I am very pleased and honoured to have won the honourable mention in the Applied Methodology category! I shall be presenting the work (Phylogenetic Models of Language Diversification) at JSM in Miami in August.

I am very grateful to the award committee, and am as always thankful for the fantastic supervision I received from Geoff Nicholls.

This is a good year for French statistics, since Julien Cornebise won the award in the Theory category! Congratulations to the other winners, Ricardo Lemos and Daniel Williamson.

Interview in La Recherche

06/03/2011

La Recherche, one of the leading French popular science magazines, has a monthly piece in which they present a Mathematician’s work in two pages. I had the honour and great pleasure of being featured in this month’s issue. Philippe Pajot interviewed me for over an hour about my work on models of language diversification and did a pretty good job at summarizing it in a few hundred words, with a presentation of the linguistic question, the data, and a mention of the issues of validation and error bars. There is also a brief attempt at explaining MCMC.

A scan is below, for French-speaking readers.


Pour la Science article on Indo-European expansion

10/02/2011

Phylogenetic models of language diversification seem to be popular these days in French popular science magazines. Of the leading publications, La Recherche will feature an interview with yours truly in March, and Pour la Science has an 8-page cover story on the subject in the current issue.

Popularizer Ruth Berger looks at the expansion of the Indo-Europeans from genetic and linguistic points of view, trying to reconcile them and to decide between the Kurgan (horsemen) and Anatolian (farmers) possible origins of Indo-European expansion. For the linguistics half, she looks at phylogenetic models to infer genealogies and dates, but skips the methodology and reproduces directly trees by Gray & Atkinson (2003) and Atkinson et al. (2005).

It is a shame that the method is presented as a black box. Given the length of the article, it would have been possible to give a general idea of how dates are inferred: the ages of parts of the tree are known, and this information is used to estimate rates of change and other ages. Instead, the author suggests that the rates are already known [whence?] and are fed to the black box, which magically outputs a tree and dates.  There is barely anything about the uncertainty of the estimates, and nothing about validation. I also have trouble understanding the points made at the end about Linear A and the attempt to merge the Anatolian and Kurgan hypotheses.

This issue is number 400 of Pour la Science. According to the editor-in-chief, they decided to celebrate with an issue on “theories and models”. Indo-European expansion is one of their examples, along with pieces on Grand Unification and on gene transfers. Uncertainty and validation are major parts of any decent modelling endeavour, and it is a shame that they did not seize the opportunity to educate their readership about these issues.

I suppose it is hardly surprising that I am disappointed with a popular science paper on a topic related to my PhD…

A Million Random Digits with 100,000 Normal Deviates

20/01/2011

Random digitsI had heard before of the 1955 book A Million Random Digits with 100,000 Normal Deviates (which is exactly what you think it is), but until Greg Ross pointed them out, I had never looked at the comments on Amazon.  They are hilarious. A short selection:

  • “Such a terrific reference work! But with so many terrific random digits, it’s a shame they didn’t sort them, to make it easier to find the one you’re looking for.”
  • “If you like this book, I highly recommend that you read it in the original binary.”
  • “While the printed version is good, I would have expected the publisher to have an audiobook version as well.”
  • “Once you get about halfway in, the rest of the story is pretty predictable.”
  • “What other author could give us such a masterpiece? Infinite monkeys typing on infinite typewriters might be able to produce Shakespeare, but they could never produce something like this!”
  • “My only regret is that there isn’t a sequel, because the author left it at a cliffhanger. ”
  • “It starts with an innocent 10097, rapidly succeded by 32533. The reader has no idea if 10097 will ever appear again, and that’s the thrilling part. I won’t spoil the story, so if you want to know whether 10097 is repeated, buy the book.”

(David Pogue has a list of other Amazon products which attract funny “reviews”.)

A Million Random Digits with 100,000 Normal DeviatesA

Ngrams: when Statistics overtook Mathematics

20/12/2010

The Internet is abuzz (1 2 3)with Google’s Ngrams Viewer which allows to track frequency of usage of words and short phrases in “lots of books” from 1500 to 2008, in six languages.

It would be easy to spend hours playing at these data. Here is a quick look at when Statistics (rightfully) overtook Mathematics, in English and in French books:

Statistics overtook Mathematics half-a-century earlier in English than it did in French. I wonder what caused the bump in French around 1960.

Data on shared bicycles in Lyon

14/12/2010

Pablo Jensen et al. recently posted on arXiv a short analysis of a fantastic data set: 11 million trips made with the shared bicycles Vélo’v in Lyon. For every trip, the start station, final station, and trip time, duration and distance are available.

Among other things, they show that the average Vélo’v is faster than the average car at peak time, that cyclists are faster on Wednesdays and slower during the week-ends, and that winter speeds are higher (presumably because casual – hence slower – cyclists only cycle during the summer). My guess would be that cyclists who ride their own bikes, rather than use Vélo’v, are even faster, since they are probably more used to cycling and definitely have better and lighter bikes.

Given the start and final point of a trip, they can also calculate the length of the shortest path which obeys all one-way streets, and they show that a whopping 61% of cyclists take a shortcut, which must involve cycling the wrong way or on pavements. (It goes without saying that Parisian cyclists would never do such a thing.)

This is a fantastic data set, and I cannot wait to see more analyses, or some visualizations à la Pedro M. Cruz.

FFJM competition underway

21/10/2010

The 2010-2011 FFJM competition has started. It is one of the main Mathematical Games competitions in France (the other being the Concours Kangourou, which gets hundreds of thousands of contestants every year and is mostly for school pupils). The FFJM competition claims 100,000 contestants from ten countries.

The first round of questions (in French) is online. For this round, there is no time limit and you can send in your answers by post.

One of the most interesting characteristics of this contest is that all age groups get the same questions: for example, 2nd-graders only need to answer the first 5 questions, 8th-graders need to answer the first 14 questions, and professionals must answer all 18 questions (the last questions being the hardest). In the semi-finals and finals, this is very nice: all contestants can talk about the questions, even if they are competing in different categories.

This is the first time in almost 10 years that I will be taking part. For people who feel like taking part, answers must be sent in by 1 January.


Follow

Get every new post delivered to your Inbox.