Installation
Using CAMEL
Tutorials

MCMCOn this page... (hide) 1. IntroductionMonte Carlo Markov Chains are popular in cosmology. Although it may look as an evolved technique it is in fact mathematically a very simple one (the MetropolisHastings algorithm, father of all, takes about 10 lines to code, you should try). But anyone who practiced them in real life conditions (ie data) knows that a lot of cooking (on CPU, proposal matrix, convergence, priors) is generally forgotten/hidden. So let see in practice how it works.
The MCMC is therefore only a method to shoot samples from some (complicated) multidimensional function. It is not especially "Bayesian" (it can be used to compute integrals) and is quite inefficient. However with some practice you can manage to sample about almost everything. Here is how MetroplisHastings (MH) works in practice to sample any function F:
Run this for "some time", then read the file and do histograms. Isn't that simple? But what does this has to do with cosmological parameter estimation? For parameters inference the sampled function F is the data likelihood. According to Bayes theorem
(there is a denominator, but it's unimportant for parameter estimation)
So that if you sample your likelihood+prior you get the posterior distribution (ie. a pdf) of the parameters you are trying to estimate Unfortunately most people forget about the priors and more or less implicitly put them to 1 arguing this is the most uninformative one (which is wrong especially in multidimension). So that finally in most cases, one uses elaborate statistical terms to justify laziness to pretend that the posterior distribution is equal to the likelihood (which is again wrong, think if something "likely" is the same as something "probable"). A serious Bayesian approach is to figure out how your result varies depending on your priors choice. An even simpler one is to compare the output of this method to the frequentist profilelikelihood one which does not have any buildin notion of prior. This is the main motivation for CAMEL Although the method is sometimes presented as magic, it is not and involves actually a lot of cooking and care (in particular with multiple parameters and complicated likelihoods) Indeed
These 2 points are very intricate: if you have a "good" proposal you can run the chain for a shorter time. In fact theory says the best proposal is to input the "real" covariance of your parameters: you can determine it from the samples themselves but only if your samples have a good proposal! In MH, people try to make some "guess" for this covariance matrix, make some (very long) runs, estimate empirically the covariances using these samples, input this matrix as a new proposal, and start again . In cosmology, this can take more than one week. 2. The proposal matrixThis is the covariance matrix of the (multidimensional) Gaussian that will be used to shoot the "next point" (think about "sigma" in 1D, it will define the mean distance of the steps). In cosmology it is unfortunately a quite complicate one, which has offdiagonal coefficients (ie. there are correlations between variables, you cannot shoot in one direction independently of the others). So what should we use for it? In CAMEL we did implement an adaptive variant of MH to reconstruct the matrix on the fly. It is not trivial since one should care not destroying the Markovian nature of the chain: (samples do not depend only on the previous one but here also on the previous proposal). In practice the "adaptation" of the proposal is gradually decreased until it gets "frozen". However we need something to start with something. We will use the Hessian matrix obtained when minimizing the global chi2 Main.Lik#toc6 It is often of poor quality (and does NOT represent the "errors" on the parameters) but generally sufficient to feed CAMEL's adaptative algorithm.
Run: $EXECDIR/Minimize lcdm.txt bestfit hessian (the two output arguments are the bestfit and the hessian outputs written in text)
In anay case read your hessian and
3. Adaptive Metropolis (AM)So let us see how to run the AM algorithm in CAMEL the lazy way (flat priors actually taken from the bounds in your parFile).
algo=ada length=1500 bunchSize=100 It means we use the AM algorithm, to generate 1500 samples(vectors) that will be written into the file every 100 steps. Then we need some rough estimate of the proposal matrix: we use the Hessian determined in the global fit so add proposal_cov=hessian Finally AM has a number of parameters to tune the run. try: t0=200 ts=0 scale=1.4 do_move=true There are actually 2 things the algorithm is adapting:
Then run $EXECDIR/mcmc lcdm_mc.par chain1.txt This should last ~ 5 mins. When the run has finished you can immediately launch 3 other ones (for later use): $EXECDIR/mcmc lcdm_mc.par chain2.txt; $EXECDIR/mcmc lcdm_mc.par chain3.txt; $EXECDIR/mcmc lcdm_mc.par chain4.txt; while they complete we will explore the properties of chain1.txt, so go next. 4. Checking for convergenceIn order to sample correctly the distribution, the samples distribution must be in a stationary state known as correct "mixing". Then its distribution "converges" to the right function (posterior/likelihood here). This requires running the algorithm sometimes for a long time (steps) before it forgets its origin and go into a region of the parameter space where the proposal matrix is well adapted. There is no way to be 100% sure a chain is mixing correctly, but you may often clearly see when it did not! Here are various ways: 4.1 Trace plotsThe very first thing to do once you have your chain (file) is simply to plot each variable evolution (parameters+chi2 see 1st line header for the names) looking for the moment were the chain is in a somewhat "stable" regime (stationarity) Look at the beginning of the chain1.txt file.
In python read the (text) file, for instance with Have all variables converged? In your view, when do the distributions begin to be stationary (if anywhere)? If not rerun with a longer length. With these plots (pedantically called "trace plots") you will identify 99% of the problems (not saying that the remaining 1% is OK) 4.2 Acceptance rateThe acceptance rate is a mean value of times the jump was accepted (so a line changed in the file).
A rule of thumb is that it should be close to 0.25 (meaning samples are repeated 3/4 of the times). It was established only for Gaussian distribution but in practice this rule also works well for other distributions.
CAMEL computes this running mean (every 100 steps) and writes it in the file (you should have) one named 4.3 Multiple chainsMore information about the mixing can be obtained when running several chains (with different starting points and generator seeds). You should now have 4 chains (chains[14].txt) available. We will use some features of the
point the PYTHONPATH variable to the directory containing camel.py: go into that directory and type: export PYTHONPATH=$PWD:$PYTHONPATH Then in python: Compare how the chains evolved for in each run : from pylab import * import camel #read the chains in a single list chains=camel.extract_chains('chain',4) #read the variables name from the 1st line of "chain1" with open('chain1.txt', 'r') as f: names= f.readline().split() #overplot the 4 chains for each variable for ivar in range(5): figure() [plot(c[ivar]) for c in chains] title(names[ivar]) 4.4 The GelmanRubin testNow we can go on with a deeper convergence test. Its idea is, for each parameter, to compare the variance of each variables within each chain to the variance among the chains. This allows to build the "R" statistics and a good rule of thumb to see where the variable converged is to look for:
(or 0.01 if you are more demanding) Fortunately camel.py does already everything for you: #Compute GelmanRubin statistic for each parameter it,R = camel.GelmanRubin(chains,gap=10,length_min =0) figure() semilogy(it,R1.) legend(names[1:]) axhline( 0.03, color='k', ls='') Determine when the chains start having a reasonable (even if not perfect) R value : we call it N0. 5. Building posteriors5.1 getting mean values and intervalsNow we construct one single chain with our "best samples" by concatenating the last part of each chain: #define your N0 value N0= #construct a single chain from the last samples (using samples above N0 steps) chain=camel.mergeMC("chain",num=(1,2,3,4),burnin=N0) #names were defined previously  you can still use the following convenience names=camel.MCparnames("chain") #plot the histograms + basic statistics # the first variable is "chi2", we don't want it for ivar in range(1,5): subplot(2,2,ivar) parname=names[ivar] hist(chain[parname]) title(parname) print("%15s " % parname, "\t %f \t %f \t %f" % camel.getci( np.percentile(chain[parname], [16,50,84]))) tight_layout() The true values in the simulations were: H0 =67.8 omega_b=0.022 omega_cdm=0.12 bias=1.07 does they fall inside your posterior distributions? 5.2 Triangle plotTo see also the correlations between variables one may use the famous "triangle plot" which not only shows the previous (smoothed) 1D histograms but 2D's too. We will use the pip install user getdist Now starting again from "chain" as the mergedMC values and names as the column names: import camel from getdist import plots, MCSamples #see previously how chain/par are defined #build samples without chi2 mcmc = [chain[par] for par in names[1:]] #the follwing translates to some nice fonts for param names labels=[camel.parname.setdefault(n,n).replace('$','') for n in names[1:]] #getDist functions: samples=[MCSamples( samples=mcmc, names=names[1:],labels=labels)] g = plots.getPlotter() # creates the "plotting" object g.triangle_plot(samples,names[1:],contour_colors='blue',filled=True) savefig("triangle.pdf") Then open the triangle.pdf file. 6. Parameters covariance matrixNow that we have some samples for the 4 variables we can estimate empirically covariance between the variables which is
as computing the variance for a single array with . (think about this can be generalized when x_i is a vector).
The import pandas df=pandas.DataFrame(chain) a=df.values.T #this creates the 2D array but need to transpose it c=cov(a[1:]) # we avoid the first column that is chi2 Plot c or better the correlation matrix and compare to the hessian. If your GelmanRubin test was bad you may want to use this empirical matrix as the cov for the proposal (instead of the Hessian,) which can improve convergence. 7. Closing the loopNow that you have 2 estimates of the "H0" parameter, one from the profile likelihood, the other one from MCMC Bayesian sampling: superimpose both (think about a proper representation) Now you are an expert, you may look at this more complete example on real data (or go to the beach...) older stuff: 