MCMSki 4


I am attending the MCMSki 4 conference for the next 3 days; I guess I’ll see many of you there!

I am organizing a session on Wednesday morning on Advances in Monte Carlo motivated by applications; I’m looking forward to hearing the talks of Alexis Muir-Watt, Simon Barthelmé, Lawrence Murray and Rémi Bardenet during that session, as well as the rest of the very strong programme.

I’ll also be part of the jury for the best poster prize; there are many promising abstracts.

Corcoran medal

17/10/2013 weeks ago, I had the great honour of receiving the 2012 Corcoran memorial medal and prize for my doctoral dissertation. It is awarded by Oxford’s Department of Statistics in memory of Stephen Corcoran, a student who died in 1996 before having time to complete his DPhil. Being a Statistics prize, there is smoothing in the award process: it is awarded every two years, to a DPhil which was completed in the last four years (i.e. between October 2008 and October 2012 in my case). The ceremony was part of the Department’s 25th anniversary celebrations.

Nils Lid Hjort gave a lecture on his “confidence distributions”, a way to represent uncertainty in the non-Bayesian framework. Although he gave examples where his representation seems to work best, I wondered how this could extend to cases where the parameter is not unidimensional.

Chris Yau received the 2010 Corcoran prize and gave a short talk on applications of HMMs togenetic data; he was unlucky to have his 15-minute talk interrupted by a fire alarm (but that allowed me to wonder at how calmly efficient the British are at evacuating in such situations). Luckily, my own talk suffered no such interruption.

Peter Donnelly demonstrated once again his amazing lecturing skills, with a highly informative talk on statistical inference of the history of the UK using genetic data.

All in all, a very enjoyable afternoon, which was followed by a lovely dinner at Somerville College, with several speeches on the past, present and future of Statistics at Oxford.

Thanks again to the Corcoran committe, especially Steffen Lauritzen, for selecting me as the prize winner!

Expect some blog posts in French


This is just a warning that from now on, a small proportion of my blog posts will be in French. I’ll use French for posts which I think will appear primarily to French speakers: either posts for students of courses that I give in French, or posts on the French higher educational system which would be of little interest to people outside of France. I guess this is similar to what Arthur Charpentier does.

In particular, I’ll keep posting in English for anything related to my research topics.

PhD acknowledgement


Acknowledgement from an anonymous doctoral dissertation in the University Microforms International database:

If I had a dime for every time my wife threatened to divorce me during the past three years, I would be wealthy and not have to take a postdoctoral position which will only make me a little less poor and will keep me away from home and in the lab even more than graduate school and all because my committee read this manuscript and said that the only alternative to signing the approval to this dissertation was to give me a job mowing the grass on campus but the Physical Plant would not hire me on account of they said I was over-educated and needed to improve my dexterity skills like picking my nose while driving a tractor-mower over poor defenseless squirrels that were eating the nuts they stole from the medical students’ lunches on Tuesday afternoon following the Biochemistry quiz which they all did not pass and blamed on me because they said a tutor was supposed to come with a 30-day money-back guarantee and I am supposed to thank someone for all this?!!

(From a UMI press release, quoted in The Whole Library Handbook 2, 1995)

Source: Futility Closet via Arthur Charpentier.

Unicode in LaTeX


The way I type LaTeX has changed significantly in the past couple of months. I now type most of my math formulae in unicode, which makes the source code much more readable.

A few months ago, I might have written

$\lambda/\mu=\kappa/\nu \Rightarrow \exists \Theta,\forall i, \sum_{j\in\mathbb{N}} E[D_{i,j}]=\Theta$

to display

\lambda/\mu=\kappa/\nu \Rightarrow \exists \Theta,\forall i, \sum_{j\in\mathbb{N}} E[D_{i,j}]=\Theta.

Now, to type the same equation, my LaTeX source code looks like this:

$λ/μ=κ/ν ⇒ ∃Θ,∀i, ∑_{j∈ℕ} E[D_{i,j}]=Θ$

which produces exactly the same output. The source code is much easier to read; it is also slightly easier to type. Here is how the magic works:

  • In the preamble, add
  • A number of special characters (including all Greek letters) were already easily available to me because I use a bépo keyboard (if you are a French speaker, you should try it out); otherwise, all characters are available using any keyboard to users of a Unix-like OS thanks to this great .XCompose file. For example, to get ℕ, use the keys Compose+|+N (pretty intuitive, and faster than typing \mathbb{N}). To get ∃, use Compose+E+E; to get ∈, use Compose+i+n, and so on.
  • There are two issues with this solution: first, the unicode symbol α maps to \textalpha instead of \alpha; second, the blackboard letters map to \mathbbm instead of \mathbb. This can lead to errors, but I wrote this file which solves the issue by including in the preamble:

This is useful for LaTeX, but also for all other places where you might want to type math: thanks to this .XCompose file, typing math in a blog post, tweet or e-mail becomes easy (for example, this is the last blog post where I will use WordPress’ $latex syntax). And if there ever is a LaTeX formula that you cannot access from your keyboard, you can use a website such as which converts LaTeX source code to unicode.

I originally heard about this on Christopher Olah‘s blog.


Late homework


My Bayesian Case Studies students had to hand in their final report last week; as I expected, some handed it in after the deadline.

I know several colleagues whose official position is that if the report is handed in late, the student fails the course. This is fine if the report is very late, but it seems a bit drastic for a small delay (e.g. a few minutes!). The problem of course is to define very and small in the previous sentence. And even if you decide that the course is failed for students who are more than (say) 24 hours late, it seems a bit unfair to have such a big difference between being 23 hours 59 minutes late and 24 hours 1 minute…

This time, I am using a continuous decreasing function as penalty: I told the students that

“Your report should be sent by e-mail by 2 April 2013 at noon. In case of a late submission, your grade will be multiplied by exp(-0.001 ⋅ t³ /86400³ ) where t is the number of seconds between the deadline and your submission.”

This basically means that they can be 2 days late for free. If a student has a legitimate reason for being late, they will usually manage to fit in that window, so they do not need to explain their lateness. Being 4 days late knocks off 6% of the grade; being a week late knocks off 30%. A student more than 8.8 days late loses 50% of their grade, meaning the course is necessarily failed, even if the report is perfect.

In this case, the last student to hand in his work was about 3.5 days late, meaning the mark will be multiplied by 0.953.

I quite like this system, although I think it was a bit too generous with this exact implementation. I cannot claim to have invented it; I read about it somewhere else, with a different penalty function, but cannot find where.

Edit: Thanks to Mahendra for the original post: here, on the Messy Matters blog.

MCMSki 4 session accepted


I juste received news from the organizers of MCMSki 4 that my proposed session on “Advances in Monte Carlo motivated by applications” has been accepted; it will be the first session I chair at a conference.

Three young and talented speakers have accepted to take part in this session, and I am looking forward to hearing what they have to say:

Alexis Muir-Watt will talk about PMCMC advances for the doubly-intractable problem of estimating a partial order, with application to the social order between 12th century bishops.

Lawrence Murray will also discuss PMCMC, for applications in the environmental sciences including marine biogeochemistry, soil carbon modelling and hurricane tracking.

Simon Barthelmé will discuss using quasi-Kronecker matrices to speed up MCMC for functional ANOVA problems, with an application in neurosciences.

All these speakers have been dealing with challenging data sets and models. These applications have led to methodological advances, and the session will at the same time showcase the variety of applications and the way they help BayesComp methodology progress.

MCMSki 4 will be held in Chamonix, January 6-8 2014.

Visiting UCLA


I shall be spending next week (4th-9th March) at the UCLA Department of Linguistics, where I shall give five hours of talks on Statistics for Historical Linguistics: a two-hour review of the field (Thursday morning), a two-hour presentation of my own DPhil work (pdf) (Friday afternoon), and a one-hour tutorial on TraitLab (Friday afternoon?). Abstracts:

1. Statistical methods in Historical Linguistics (Thursday morning)
Recent advances in our understanding of language change, in statistical methodology, and in computational power, along with an increasing wealth of available data, have allowed significant progress in statistical modelling of language change, and quantitative methods are gaining traction in Historical Linguistics. Models have been developed for the change through time of vocabulary, morpho-syntactic and phonetic traits. I shall present a review of these models (from a statistician’s point of view), starting with Morris Swadesh’s failed attempts at glottochronology, then looking at some models developed in the last decade. In parallel, I shall provide brief insights into statistical tools such as Bayesian statistics and Markov Chain Monte Carlo, in order to show how to use these effectively for linguistic applications.
2. A phylogenetic model of language diversification (Friday afternoon)
Language diversification is a random process similar in many ways to biological evolution. We model the diversification of so-called “core” lexical data by a stochastic process on a phylogenetic tree. We initially focus on the Indo-European language family. The age of the most recent common ancestor of these languages is of particular interest and issues of dating ancient languages have been subject to controversy. We use Markov Chain Monte Carlo to estimate the tree topology, internal node ages and model parameters. Our model includes several aspects specific to language diversification, such as rate heterogeneity and the data registration process, and we show that lexical borrowing does not bias our estimates. We show the robustness of our model through extensive validation and analyse two independent data sets to estimates the age of Proto-Indo-European. We then analyse a data set of Semitic languages, and show an extension of our model to explore whether languages evolve in “punctuational bursts”. Finally, we revisit an analysis of several small data sets by Bergsland & Vogt (1962).
Joint work with Geoff Nicholls

3. Tutorial and practical: TraitLab, a package for phylogenies of linguistic and cultural traits
In this tutorial, I shall present how to use the TraitLab package, which was initially developed specifically for the modelling of core vocabulary change through time, and guide interested attendants through an analysis of a simple data set. TraitLab is a software package for simulating, fitting and analysing tree-like binary data under a stochastic Dollo model of evolution. It handles “catastrophic” rate heterogeneity and missing data. The core of the package is a Markov chain Monte Carlo (MCMC) sampling algorithm that enables the user to sample from the Bayesian joint posterior distributions for tree topologies, clade and root ages, and the trait loss and catastrophe rates for a given data set. Data can be simulated according to the fitted Dollo model or according to a number of generalized models that allow for borrowing (horizontal transfer) of traits, heterogeneity in the trait loss rate and biases in the data collection  process. Both the raw data and the output of MCMC runs can be inspected using a number of useful graphical and analytical tools  provided within the package. TraitLab is freely available and runs within the Matlab computing environment.

Attendants who wish to use TraitLab during the practical should have a computer with Matlab installed.

TraitLab was developed jointly with Geoff Nicholls and David Welch.

Bayesian model choice for the Poisson model


Following Arthur Charpentier‘s example, I am going to try to post occasionally on material covered during my courses, in the hope that it might be useful to my students, but also to others.

In the second practical of the Bayesian Case Studies course, we looked at Bayesian model choice and basic Monte Carlo methods, looking at data about the number of oral questions asked by French deputies (Members of Parliament). We wanted to know whether male and female deputies behave differently in this matter. If we model the number of questions Y_i asked by a deputy by a Poisson random variable and denote Z_i their gender (1 for men, 2 for women), we thus need to choose between the following two models. (See the course website for the question sheet and other material; the slides are below.)

Slides2 from Robin Ryder

Model 1: Y_i\sim Poisson(\lambda); \quad \lambda \sim \Gamma(a,b)

Model 2: Y_i|Z_i=k\sim Poisson(\lambda_k); \quad \lambda_1\sim \Gamma(a,b); \quad \lambda_2\sim \Gamma(a,b)

First download the data and set some variables:

 > deputes=read.csv2('')
 > attach(deputes)#Q1
 > summary(questions_orales)
 > hist(questions_orales,breaks=(0:6-.5))
 > (n=length(questions_orales))
 > (nh=sum(sexe=="H"))
 > (nf=sum(sexe=="F"))
 > (qtot=sum(questions_orales))
 > (qh=sum(questions_orales[sexe=="H"]))
 > (qf=sum(questions_orales[sexe=="F"]))

Since the Gamma prior is a conjugate prior for the Poisson model, we get a Gamma distribution as our posterior. Using a \Gamma(2,2) as prior, here is a comparison of the prior and the posteriors for the two models (in the bottom plot, \lambda_1 is in black and \lambda_2 in red.


Two observations here: First, the posterior is much more peaked than the prior, which is expected since the number of data points is large. Second, in model 2, it seems that there may be a difference between men and women, with women asking slightly less questions in Parliament.To verify this, we compute a Bayes factor to compare the two models. In this case, the Bayes factor is available in closed form, but an initial computation does not work because n is too large:

> BF_analytical=function(a,b,y1,n1,y2,n2){
+   m1=b^a/gamma(a)*gamma(a+y1+y2)/(b+n1+n2)^(a+y1+y2)
+   m2=b^(2*a)/gamma(a)^2*gamma(a+y1)/(b+n1)^(a+y1)*gamma(a+y2)/(b+n2)^(a+y2)
+   return(m2/m1)
+ }
> BF_analytical(2,2,qh,nh,qf,nf)
[1] NaN
Warning messages:
1: In BF_analytical(2, 2, qh, nh, qf, nf) :
  value out of range in 'gammafn'
2: In BF_analytical(2, 2, qh, nh, qf, nf) :
  value out of range in 'gammafn'

However, we are able to compute the Bayes factor analytically by using the log-scale instead.

> BF_analytical2=function(a,b,y1,n1,y2,n2){
+   m1=a*log(b)-lgamma(a)+lgamma(a+y1+y2)-(a+y1+y2)*log(b+n1+n2)
+   m2=2*a*log(b)-2*lgamma(a)+lgamma(a+y1)-(a+y1)*log(b+n1)+lgamma(a+y2)-(a+y2)*log(b+n2)
+   return(exp(m2-m1))
+ }
> BF_analytical2(2,2,qh,nh,qf,nf)
[1] 0.1875569
> log10(BF_analytical2(2,2,qh,nh,qf,nf))
[1] -0.726867

Using Jeffrey’s scale of evidence, this corresponds to “substantial” evidence in favour of model 1 (i.e. no difference between men and women).

Even if we had not been able to calculate the Bayes factor exactly, we could have approximated it using Monte Carlo. We first create two functions to calculate the likelihood under model 1 and model 2, and start with vanilla Monte Carlo.

> lkd.model1=function(y,n,lambda){  return(exp(-n*lambda+y*log(lambda))) }
> lkd.model2=function(y1,n1,y2,n2,lambda1,lambda2){  return(exp(-n1*lambda1+y1*log(lambda1)-n2*lambda2+y2*log(lambda2))) }

> BF_MC=function(a,b,y1,n1,y2,n2,M){
 +  lambda1=rgamma(M,a,b)
 +  m1=cumsum(lkd.model1(y1+y2,n1+n2,lambda1))/(1:M)
 + lambda2.1=rgamma(M,a,b)
 + lambda2.2=rgamma(M,a,b)
 +  m2=cumsum(lkd.model2(y1,n1,y2,n2,lambda2.1,lambda2.2))/(1:M)
 +  return(m2/m1)
 + }

> M=100000
> BF_MC(2,2,qh,nh,qf,nf,M)[M]
[1] 0.1776931
> plot(100:M,BF_MC(2,2,qh,nh,qf,nf,M)[100:M], type="l")

The last line creates a plot of how our Monte Carlo estimate of the Bayes factor changes as the number of iterations M increases. It is useful to check convergence.vanilla

Here, it is clear that the method converges after a few tens of thousands iterations. Our estimate for M=10^5 is close to the true value (two decimal places precision). Next, we try the harmonic mean method:

> BF_HM=function(a,b,y1,n1,y2,n2,M){
+   lambda1=rgamma(M,a+y1+y2,b+n1+n2)
+   m1=(1:M)/(cumsum(1/lkd.model1(y1+y2,n1+n2,lambda1)))
 +  lambda2.1=rgamma(M,a+y1,b+n1)
 +  lambda2.2=rgamma(M,a+y2,b+n2)
 +  m2=(1:M)/cumsum(1/lkd.model2(y1,n1,y2,n2,lambda2.1,lambda2.2))
+   return(m2/m1)  
+ }

> BF_HM(2,2,qh,nh,qf,nf,M)[M]
[1] 0.2451013
> plot(100:M,BF_HM(2,2,qh,nh,qf,nf,M)[100:M], type="l")

We observe that this does not appear to converge. It is known that in general, the harmonic mean method is unreliable.

We finally move on to importance sampling. In importance sampling, we want to estimate E[h(X)] where X\sim f by using simulations from an alternate density g. This will be most efficient when g is close to $h\times f$. In this case, we therefore need g to be similar to the posterior distribution, so we take a normal density with mean and variance the same as the posterior distribution:

> BF_IS=function(a,b,y1,n1,y2,n2,M){
+  mean1=(a+y1+y2)/(b+n1+n2)
+  mean2.1=(a+y1)/(b+n1)
+  mean2.2=(a+y2)/(b+n2)

+  sigma1=sqrt((a+y1+y2)/(b+n1+n2)^2)
+  sigma2.1=sqrt((a+y1)/(b+n1)^2)
+  sigma2.2=sqrt((a+y2)/(b+n2)^2)

+  lambda1=rnorm(M,mean1,sigma1)
+  m1=cumsum( lkd.model1(y1+y2,n1+n2,lambda1) * dgamma(lambda1,a, b) / dnorm(lambda1,mean1,sigma1))/(1:M)

+  lambda2.1=rnorm(M,mean2.1, sigma2.1)
+  lambda2.2=rnorm(M,mean2.2, sigma2.2)
+  m2=cumsum( lkd.model2(y1,n1,y2,n2,lambda2.1,lambda2.2) * dgamma(lambda2.1,a, b) * dgamma(lambda2.2,a,b) / 
+    (dnorm(lambda2.1,mean2.1,sigma2.1)*dnorm(lambda2.2,mean2.2,sigma2.2))) / (1:M)

+  return(m2/m1)
+ }

> BF_IS(2,2,qh,nh,qf,nf,M)[M]
[1] 0.1875875
> plot(100:M,BF_IS(2,2,qh,nh,qf,nf,M)[100:M], type="l")

ISThis method converges. Note that the y-scale is much tighter than for vanilla Monte Carlo: convergence is in fact much faster, and we get a more precise estimate than by using vanilla Monte Carlo. This is more obvious if we look at both methods on the same graph.

> plot(100:M,BF_MC(2,2,qh,nh,qf,nf,M)[100:M], type="l") 
> lines(100:M,BF_IS(2,2,qh,nh,qf,nf,M)[100:M], col="red")

It is much more visible now that importance sampling is a clear winner.

With this computation of the Bayes factor, we choose model 1: a single parameter can be used to model the behaviour of men and women.

Beeminder will help you keep your resolutions


I make many resolutions, and keep hardly any on the long term, which I think is very common. For about two years, I have been using Beeminder, a tool developed by the recently mentioned Dan Reeves, which is fantastic for number-inclined people (which includes most readers of this blog, presumably).

For example, I have a stack of academic papers to read which keeps growing. I have set the (modest) goal to read at least 2 of those papers per week on average, i.e. that I read 104 papers in a year; others might want to lose 10 kg by July, or get their number of unread e-mails back to 0 by next month. Usually, such resolutions have the following effect: during the first couple of weeks, I will indeed read 2 papers/week; in week 3, I’ll read only 1, but think that I’ll catch up the next week, which I never do; by week 4, I’ve got so much catching up to do that I might as well give up on the resolution.

Beeminder helps check that I stay on track constantly. On any given day, I must be on the “yellow brick road” which leads to my goal, and which allows some leeway based on the variance of data entered up until now. A big goal in the far future is thus transformed into a small goal in the near future. In the graph below, the number at the top left is the number of days I have before I must read another paper to avoid derailing.

If you need an extra incentive, you can pledge money, which is only charged in you fail at your goal (you are allowed to fail your goal once for free; if you fail and want to start again, you need to promise to pay should you fail a second time).

This is all explained in further detail on Beeminder’s blog. Apart from academic papers to read, I am reading how much I swim, boring admin tasks, and several other goals I would not keep otherwise, and I have found Beeminder to be a great way to achieve these goals. Give it a try!


Get every new post delivered to your Inbox.