Sampling with CAMEL

On this page... (hide)

  1.   1.  Installation
  2.   2.  Getting the Hessian
  3.   3.  Adaptive Metropolis (AM)
  4.   4.  Checking for convergence
    1.   4.1  Trace plots
    2.   4.2  Acceptance rate
  5.   5.  Multiple chains
    1.   5.1  The Gelman-Rubin test
  6.   6.  Building posteriors
    1.   6.1  getting mean values and intervals
    2.   6.2  Triangle plot
  7.   7.  Closing the loop

1.  Installation

2.  Getting the Hessian

The main idea behing CAMEL is to have a framework that developps both the frequentist and Bayesian approaches for comparison. A known difficulty is to get a (gaussian) proposal matrix that is "not too bad". This is simple if someone already did the work and provides you the already estimation covariance matrix and you just want to repoduce results. But when you add some (or several new parameters) the proposal can (and genrally does) strongly change. CAMEL introduced an "adaptative" algorithm in order to build the covariance on the fly , ie. directly from the samples you generated. This requires having an initial guess.

We take advantage from the minimization capabilities of CAMEL to fisrt perform a global minimization which not only gives you the "best fit" but also some estimate of the Hessian matrix (the second derivatives at the minimum). It is often of poor quality (and does NOT represent the "errors" on the parameters) but generally of sufficient quality to feed CAMEL's adaptative algorithm.

We will work with (a fake) likelihood based on Pk measured at different redshift and try to determine 4 parameter of the underlying Lambda-CDM parameters. So take this parameter file and have a look at it. Be sure you understand each line (more in session 1. Then run the minimization with

$EXECDIR/Minimize lcdm.txt bestfit hessian

(the two output arguments are the best-fit/hessian outputs written in external files) where EXECDIR is here to indicate where the executable lies, ie starting from output directory:

  • docker: EXECDIR=../../Linux-x86_64
  • mardec: EXECDIR=../../amd64_linux26

Check that the minimization gives something "reasonnable" (how?)

3.  Adaptive Metropolis (AM)

So let us see how to run the AM algorithm in CAMEL the lazy way (flat priors actually taken from the bounds in your parFile).

  • copy lcdm.txt as lcdm_mc.par (yes that's useless! just a matter of renaming and not loosing the initial file)
  • comment out the "precisionFile" line: why don't we need it? and the verbose parameter (just to gain time)
  • then add the following parameters
algo=ada
length=1000
bunchSize=100

It means we use the AM algorithm, to generate 1000 samples(vectors) that will be written into the file every 100 steps.

Then we need some rough estimate of the proposal matrix: we use the Hessian determined in the global fit so add

proposal_cov=hessian

Finally AM has a number of parameters to tune the run. try:

t0=100
ts=0
do_move=true
scale=1

There are actually 2 things the algorithm is adapting from the data

  1. the covariance matrix
  2. the scaling (a multiplicative factor) of this matrix
  • in a first stage (until t0 samples) we accumulate data using the given proposal and the scale factor is fixed to "scale". If you believe that your Hessian is not too bad you may try putting 1 (otherwsise use a lower value)
  • then the scale factor is fixed (for ts sampples) to a "resonnable" value in order to adapt only the covariance matrix (here we don't use that feature, ts=0)
  • finally , after t0+ts samples, the scaling is adapted to reach asymptotically an acceptance rate of 0.25 (a magic number we will see later)

"do_move" allows to jump randomly the first sample, so that each run will start with a different value. Note that the seed of the random number generator changes for each run.

Then run

$EXECDIR/mcmc lcdm_mc.par chain1.txt

This should last about 3mins. If you are ensure that everything is fine, you can check regularly the number of lines written in chain1.txt (which should increase with time by chuncks of 100)

wc -l chain1.txt

4.  Checking for convergence

In order to sample correctly the distribution, the samples distribution must be in a stationary state known as correct "mixing". Then its distribution "converges" to the right function (posterior/likelihood here). This requires running the algorithm sometimes for a long time (steps) before it forgets its origin and go into a region of the parameter space where the proposal matrix is well adapted.

There is no way to be 100% sure a chain is mixing correctly, but you may often clearly see when it did not! Here are various ways:

4.1  Trace plots

The very first thing to do once you have your chain (file) is simply to plot each variable to see how it evolved (parameters+chi2 see 1st line header for the names) looking for the moment were the chain is in a somewhat "stable" regime (stationarity)

In python read the (text) file, for instance with var=loadtxt("chain1.txt",skiprows=1,unpack=True) (or genfromtxtx, pandas.read_csv ...) and plot each variable evolution.

Have all variables converged? In your view, when do the distributions begin to be stationary (if anywhere)?

With these plots (pedantically called "trace plots") you will identify 99% of the problems (not saying that the remaining 1% is OK)

4.2  Acceptance rate

Have a look at the chain1.txt values. What do you notice? The acceptance rate is a mean value of times the jump was accepted (so value changed in the file). A rule of thumb is that it should be close to 1/4 (meaning samples are repeated 3/4 of the times). It was established only for Gaussian distribution but in practice this rule also works well for other distributions. CAMEL computes a this running mean (every 100 steps) and writes it in the file (you should have) named ar_vs_length.txt You may open it and check values that should be around 0.25. This is not a proof the chain converged but is a good hint.

Finally if you wish to see how the scale factor was adapted the file scale_vs_length.txt keeps the values. You may look how it evolved in python.

5.  Multiple chains

Once you are happy with your chain a finer insight can be gained running several chains and comparing their characteristics, so construct 4 independent chains

$EXECDIR/mcmc lcdm_mc.par chain2.txt
$EXECDIR/mcmc lcdm_mc.par chain3.txt
$EXECDIR/mcmc lcdm_mc.par chain4.txt

They should be different each time since you start from a different starting point (do_move=true) and the seed of the random number generator is (implicitely) different in each run.

We will now use some features of the camel.py library

  • on mardec: it is available in work/tools/python It was appendend to your PYTHONPATH when you run camel_setup.sh so you should not need to copy it.
  • docker: it available in the container but not on your computer. When running the image, you can copy it from work/tools/python to the output directory which is a shared area to get it persistent.

So in python:

Compare how the chains evolved for in each run :

from pylab import *
import camel

#read the chains in a single list
chains=camel.extract_chains('chain',4)


#read the variables name from the 1st line of "chain1"
with open('chain1.txt', 'r') as f:
    names= f.readline().split()

#overplot the 4 chains for each variable
ivar=0
[plot(c[ivar]) for c in chains]
title(names[ivar])

#look at the other variables too

5.1  The Gelman-Rubin test

Now we can go on with a deeper convergence test. Its idea is, for each parameter, to compare the variance of each variables within each chain to the variance among the chains. This allows to build the "R" statistics and a good rule of thumb to see where the variable converged is to look for:

R-1<0.03

(or 0.01 if you are more demanding)

Fortunately camel.py does already everything for you:

#Compute Gelman-Rubin statistic for each parameter
it,R = camel.GelmanRubin(chains,gap=10,length_min =0)
figure()
plot(it,R-1.)
legend(names[1:])
axhline( 0.03, color='k', ls='--')
  • determine when (hopefully) all parameters are satisfactory

6.  Building posteriors

6.1  getting mean values and intervals

Now we are more confident that the chains sample correctly the likelihood around its maximum after some number of steps (lets say N0), we construct one single chain with our "best samples" by concatenating the last part of each chain:

#define your N0 value
N0=

#construct a single chain from the last samples (using samples above N0 steps)
chain=camel.mergeMC("chain",num=(1,2,3,4),burnin=N0)

#names were defined previously - you can still use the following convenience
names=camel.MCparnames("chain")

#plot the histograms + basic statitics
# the first variable is "chi2", we don't want it
for ivar in range(1,5):
     subplot(2,2,ivar)
     parname=names[ivar]
     hist(chain[parname])
     title(parname)
     print("%15s " % parname, "\t %f \t %f \t %f" % camel.getci( np.percentile(chain[parname], [16,50,84])))

tight_layout()

Now show God your result and ask Him for what She/He put in the simulation.

6.2  Triangle plot

One can produce the famous "triangle plot" which not only shows the (smoothed) 1D histograms but 2D's too.

We will use the GetDist python package to do so. So if you do not have it (also for mardec), do

pip install --user getdist

Now starting again from "chain" as the mergedMC values and names as the column names:


import camel
from getdist import plots, MCSamples

#see previously how chain/par are defined

#build samples without chi2
mcmc = [chain[par] for par in names[1:]]
#the follwing translates to some nice fonts for param names
labels=[camel.parname.setdefault(n,n).replace('$','') for n in names[1:]]

#getDist functions:
samples=[MCSamples( samples=mcmc, names=names[1:],labels=labels)]
g = plots.getPlotter()                # creates the "plotting" object

g.triangle_plot(samples,names[1:],contour_colors='blue',filled=True)

7.  Closing the loop

  • Now that you have 2 estimates of the "bias" parameter, one from the profile likelihood, the other one from MCMC Bayesian sampling: compare both (think about a proper representation)
  • now you have valid samples of the posterior, you can compute the empirical covariance (which is a generalization of the statistical variance in 1D): good news, it is implemented in python (numpy) as "cov":

construct the empirical covariance and compare it to the Hessian.

Now you are a real expert, time to have a look at this more complete tutorial