MCMC

Introduction

Monte Carlo Markov Chains are popular in cosmology. Although it may look as an evolved technique it is in fact mathematically a very simple one (the Metropolis-Hastings algorithm, father of all, takes about 10 lines to code, you should try). But anyone who practiced them in real life conditions (ie data) knows that a lot of cooking (on CPU, proposal matrix, convergence, priors) is generally forgotten/hidden. So let see in practice how it works.

  • Monte-Carlo : means that it is a procedure to draw samples according to a distribution ("draw" is the same meaning than when you draw samples from a gaussian as randn in python)
  • Markov Chain: is an ordered set of samples (some would say a (n)tuple) where each sample ( vector) depends only on the previous one.

The MCMC is therefore only a method to shoot samples from some (complicated) multi-dimensional function. It is not especially "Bayesian" (it can be used to compute integrals) and is quite inefficient. However with some practice you can manage to sample about everything.

Here is how Metroplis-Hastings (MH) works in practice to sample a function F:

  1. start with some values for (x1,x2,...,xN)
  2. shoot a random increment (d1,d2,...,dN) according to generally a multivariate Gaussian matrix (the so called "proposal")
  3. is F(x1+d1,x2+d2,...,xN+dN) > F(x1,x2,...,xN)?
    1. yes: "go" there ie. x1=x1+d1, x2=x2+d2,....,xN=xn+dn
    2. no : you may go there but not sure (you do another random shot with a uniform distribution "u" and go there (ie. x1=x1+d1, x2=x2+d2,....,xN=xn+dn only if u<F(x1+dx1,x2+dx2,...,xN+dxN)/F(x1,x2,...,xN))
  4. print x1,x2,...,xN and F into a file (even if it repeats itself)
  5. goto 2

Run this for "some" time, then read the file and do histograms.

Isn't that simple? But what does this has to do with cosmological parameter estimation?

For parameters inference the sampled function F is the data likelihood. According to Bayes theorem

f(params|data)=f(data|params)* prior(params)

(there is a denominator, but it's however unimportant for parameter estimation)

  • f(params|data) is the parmeters posterior distribution
  • f(data|params)=likelihood(params)
  • prior(params) is your a-apriori degree of belief about the parameters

So that if you sample your likelihood+prior you get the posterior distribution (ie. a pdf) of the parameters you are trying to estimate Unfortunately most people forget about the priors and more or less implicitly put them to 1 arguing this is the most uninformative one (that's wrong see Jeffreys priors). So that finally in most cases, one uses elaborate statistical terms to justify laziness to pretend that the posterior distribution is equal to the likelihood (which is again wrong, think if something "likely" is the same than something "probable"). A serious Bayesian approach is to figure out how your result varies depending on your priors choice. An even simpler one is to compare the output of this method to the frequentist profile-likelihood one which does not have any build-in notion of prior. This is the main motivation for CAMEL

If you are new to MCMC techniques you should definitely code once in your life the Mettroplis-Hastings algorithm (on a very simple case) to understand how all this works (local python work) Main.Metropolis

Although the method is sometimes presented as magic, it is not and involves actually a lot of cooking and care (in particular with multiple parameters and complicated likelihoods) Indeed

  • what does running for "some time" mean?
  • what to use for the proposal? if the increment is too large the test 3.2 will always fail once you are close to the maximum so that you recopy again and again the same value into the file. If it's too low the test will mostly succeed and the sampling performed a bit everywhere (so why not scan a grid of all values instead?): it will be very long to scan all dimensions... What correlations to put in the "proposal" is another important question.

These 2 points are very intricate: if you have a "good" proposal you can run the chain for a shorter time. In fact theory says the best proposal is to input the "real" covariance of your parameters: you can determine it from the samples themselves but only if your samples have a good proposal! In MH, people try to make some "guess" for this covariance matrix, make some (very long) run, estimate empirically the covariances using these samples, input this matrix as a new proposal, and redo some run(s). In cosmology, this can take more than one week.

In CAMEL we did implement some adaptive strategy to reconstruct a covariance matrix on the fly. It is not completely trivial and one should care not destroying the Markovian nature of the chain: samples do not depend only on the previous one but here also on the previous proposal (in practice the "adaptation" of the proposal is gradually decreased until it gets "frozen". We will learn how to use that in this standard level module but on fake data (docker or mardec)

If you already know about CAMEL you might be more interested by revisiting the sigma8 measurement on real data : (mardec) : Main.sigma8

back