EUCLIDSchool

S. Plaszczynski, Frejus July 2017.

Foreword : take some time to answer the questions in bold in the following.

1.  Session 1 : Likelihood-based statistics

1.1  pre-requisites

  • In order to get a good display connection (we will be using remotely python/matplotlib)
install the x2go client available from here (please do it before the school)
  • In case the X display is still blocking we will retrieve some remote files and work on them locally. For this we will be using python (+standard scientific packages known as scikit) that you most certainly already have. Otherwise Anaconda is a great python distribution.

1.2  CAMEL Installation

The first time

Due to a high CPU load when running interactively, we will work on a remote cluster (DEC at CPPM thanks to A. Tilquin and his team)

Within the X2go window, login as binomeXX@mardecXX.in2p3.fr where XX will be assigned to you and the password given: note that XX appears twice: each user is dispatched on different nodes. If you have performance issues later tell it to the instructor so that we can find better nodes.

First, check you are using the bash shell

 printenv SHELL 

should return /bin/bash, otherwise inform the instructor.

Copy the initialization scripts to your home directory (ie. where you log in):

cp /dec/users/plaszczy/euclid_setup.sh .
source euclid_setup.sh

... and have a look at them with your preferred editor:we (strongly) suggest you use an editor without an X window as "nano, (x)emacs -nw, vim ..."

clone CAMEL euclid-school branch:

git clone -b euclid-school https://gitlab.in2p3.fr/cosmotools/CAMEL.git CAMEL/HEAD

This will pull the CAMEL code and put it into CAMEL/HEAD. HEAD is a special name for CMT (that you will see later)

Then execute "camel_setup" from the CAMEL/HEAD/cmt directory :

cd CAMEL/HEAD/cmt 
ln -s requirements-dec requirements
source camel_setup.sh

which class version is used? (for git experts: is it the latest one?)

CMT is a convenient tool to avoid writing complicated Makefiles. details in : http://www.cmtsite.net

The Makefile is replaced by the much more user-friendly requirements file (in /cmt). Have a (short) look at it. Then, still in cmt/, type:

make

This builds the libraries.

If everything goes fine, construct the executables and test programs with:

make test
make exec

where are the executables?
check things looks OK by running testKlass (which shows some capabilities of the CLASS C++ version)

The main path for CAMEL (ie. ~/CAMEL/HEAD) is defined through the environment variable $CAMELROOT : it appeared when you sourced source camel_setup.sh

In the following all paths given are relative w.r.t. this location.

All the executables are located in amd64_linux26 which is defined by the variable $CMTCONFIG

Optional: if you want to have a look at the structure of the code see Main.CamelUse (but you don't need it)

At each session

you always need to do

source euclid_setup.sh
cd CAMEL/HEAD/cmt
source camel_setup.sh

1.3  Model+parameters

Let us first create a directory where we will put our work. There is already a $CAMELROOT/work one, so maybe something in it as "output"

 
cd $CAMELROOT
mkdir work/output
cd work/output

We will learn how to write a parameter file ("parFile").

The very first questions one must ask are

  1. what cosmological model do we want to confront to the data?
  2. what are the parameters of this model?

Here we will use the popular (flat) Lambda-CDM one. How many free parameters does it contain?

In the following for the data we will use a (fake) matter power spectrum measurement (embedded in a likelihood) and work with 3 parameters only (for CPU reasons) ie. we assume the other parameters are known and fixed. In linear theory, the matter power-spectrum shape is most sensitive to the Hubble constant, and matter densities where one should distinguish between the standard "baryonic" one and the "cold dark matter" contribution.

  • find their corresponding names in the CLASS package: you may have a look at /dec/users/plaszczy/class/v2.6.0/explanatory.ini
  • what is the difference between "Omega"'s and "omega"'s? which one is better to use?
  • '''what are "reasonable" values for the 3 parameters as measured today? (this was put in the simulation)

In the following we will call your choice of parameters "par1,par2,par3" and the first values as "guess1,guess2,guess3"

Edit a new file called for instance lcdm.par and complete it with your choices:

fix par1 cosmo guess1
fix par2 cosmo guess2
fix par3 cosmo guess3

class	sBBN\ file	bbn/sBBN.dat
  • The last line is here to indicate to CLASS where it must search for the file which includes nucleo-synthesis data (used to determine the n_b/n_gamma ratio). The "\" sign is here to warn that the following space is part of the same word.

The starting keyword "class" can be used to transmit any (valid) string parameter to CLASS (as in explanayory.ini).

  • you may wonder what "bbn/" refers to: In CAMEL all paths refer to the $CAMELROOT/lik directory (check bbn/ exists there)

Validation: the writeSpectraPk executable allows to run class and produce a fits file with several spectra. It is a good way to check that your parFile is OK (it is similar to running the "class" executable but using a CAMEL parFile)

..../amd64_linux26/writeSpectraPk lcdm.par 2000 output.fits 0.5

(where "...." means the relative path from where you work to the amd64_linux26 directory)

The third argument is lmax (here 2000) for Cl(CMB), the next is the name for the fits output file, and the last one (here 0.5) is the redshif (z) at which the matter spectrum is computed:

You may have a look at the spectrum (but do not spend much time on this) which is in the 3d header of the fits file.

You can do it remotely with:

 ipython --pylab

or retrieve the output.fits file on your computer (using for instance scp)

Here is how to read a FITS file in python:

from astropy.io import fits

h= fits.open("output.fits")
pkdata=h[3].data

k=pkdata['k']
pk=pkdata['pklin']

loglog(k,pk)

1.4  Likelihood

A likelihood is a real function of a set of parameters "pars" (a vector, here (par1,par2,par3)) that is supposed to return a high value when the model (or associated observable) matches the data and low values otherwise. If correctly written (a big job in real life) it contains all the information concerning some measurements(s). We will equally use the term "chi2" which by definition is chi2(pars)=-2 ln[L(pars)] so that a "high likelihood" means a "low chi2".

An essential piece of information is the value of pars for which L(pars) is maximal or chi2(pars) minimal. This is the Maximum Likelihood Estimator (MLE) and is most often the best one can do to infer which set of pars describes best the data.

We will now work with a very basic likelihood, which assumes we did measure a set of "Pk" measuremnts at redshifts of z=0.5,1,1.5 over some k range and we know exactly the associated Gaussian errors (this is a very over-simplified example but it will really happen soon with the EUCLID/LSST missions, with a much higher precision than what we are playing with today).

Here, the likelihood value (or chi2, a single number) of a set of cosmological parameters pars=(par1,par2,par3) is in fact computed in two steps:

  1. from a given set of parameters (pars) first CLASS computes the corresponding P(k,z_j) specta (z_j=0.5,1,1.5)
  2. Then the Chi2 values of this model w.r.t to each measurement are summed up

chi2(pars,z_j)=sum_i [(Pk(i,z_j)-Pk_data(i,z_j)/sigmaPk_data(i,z_j)]**2 where i runs over the measured "k" values and "j" the redshifts.

To use these likelihoods (that are by default summed up) add the following lines to your parFile

 
fakePk1=fakepk/pk_0.5.lik
fakePk2=fakepk/pk_1.lik
fakePk3=fakepk/pk_1.5.lik
  • The first line gives the relative path of the likelihood file : '''have a look at one of them.
  • have a look at the corresponding data file too (also pure text) and plot them (you may use the python loadtxt function
  • Then we add to our parFile some CLASS directives to compute the spectrum at the right redshifts (the default being z=0). For optimization with this likelihood we also don't need CLASS to compute spectra above some k_max value right? Then complete XX and YY in the following CLASS-specific parameters:
class z_pk	XX
class P_k_max_1/Mpc	YY

To check that the parFile is correctly configured we then run the writeChi2 executable on it, which simply prints the chi2 value corresponding to your fixed parameters values, ie.

 
..../amd64_linux26/writeChi2 lcdm.par

'What does happen?

The (galactic) bias is a nuisance parameter, that comes from the fact that one does not measure directly the full matter power spectrum but the power spectrum inferred from some population of galaxies. It is not well known (and actually varies with the galaxy selection and redshift, here for pedagogical reasons we assume it is the same in all the likelihoods) There are generally much more nuisance parameters qualifying for instance possible systematics in the measurement: they impact directly the likelihood (not the cosmological model) and can have some external constraints depending on independent measurements/a-priori knowledge. So finally what is called "parameters" is a set of (cosmos,nuisances) ones. The likelihood actually depends on them through :

L(Pk(cosmos),nuisances)

'''Complete your parFile adding the nuisance parameter (in this case instead of "cosmo", use the keyword is "nui"). But how to find a good fist guess for this parameter? You may use for that the ScanParam executable. The idea is to print the chi2 values varying one parameter (all the other are fixed to the values you specified in your parameter) and look for where it is minimal. It is not a minimization wrt to all teh parameters, but it can give you some idea where a good guess is (assuming the other parameters are not to wrong...) Let say we know it is somewhere between 1 and 2 (which is the case)

..../amd64_linux26/ScanParam lcdm.par bias 1 2 100

will vary the bias parameter linearly in the [1,2] interval with 100 steps. Look for the best chi2 and use the corresponding bias value

Rerun writeChi2 with this bias parameter set. Note the chi2 value. Is this a "good" one? (justify)

1.5  The Maximum Likelihood Estimator, a.k.a. "best-fit"

Given some measurement(s), we are looking for the set of parameters (within a model) that is the most likely to produce those data (note the "likely" semantic, it is not the same than "probable"). Technically this is performed by looking for the parameters that minimize the chi2(pars) function (or maximizes the likelihood). "pars" is a vector that includes both cosmological and all the nuisance parameters. Here it has low dimensionality but it is frequent to go up to 10, 30 or 100 parameters. So we are dealing with the minimization of a multi-dimensional function. There are several algorithms to do that but there is no magical one (unless your likelihood is linear which is rarely the case). Furthermore we don't have access here to the analytical gradients of the function which complicates the work. We use in CAMEL the MINUIT package which has been widely used in High Energy Physics. However remember: no magic, you always have to experiment with such tools to ensure your really captured the minimum chi2 solution. This means trying different starting points and check that the solution is not blocked at some boundary for example.

So far our parameters were fixed. Let us release one. Modify one of your parameter adding a (rough) estimate of its error and some bounds within which it should be varied
Technically this amounts to modify the "fix" keyword by a "par" one and complete the following format:

parnamecosmo(or nui)valueerrorlower_boundupper_boud

with guesses for the "error" and "bounds" (use spaces or tabs to separate the fields)

Also to see what happens during the minimization add to your parFile

verbose=true

Then run ("...." should be replaced by the proper relative path)

..../Minimize lcdm.par bf.txt hess.txt

(the two output arguments are the best-fit/hessian outputs written in external files)

What is shown is then the values that are being tried by MINUIT: the last 2 columns are of interest also: they gives respectively the chi2 value for this set of parameters and the CLASS time computation of the model (in s).

Don't be worried at the end by this kind of message:

WARNING: FunctionMinimum is invalid.

which is hard to avoid because of the numerical noise that was discussed. The minimum can be actually valid (but maybe not the estimate of its error, see the Hessian part).

Modify slightly the input value and rerun. Do you obtain the same minimum? Which one is better?

It may be strange that the minimizer does not always find the same minimum depending on the starting values. In fact it is a challenging problem to go the minimum very precisely (we will see later that we want the chi2_min value below 1 in precision). First because we do not have the gradients to help the descent. But essentially because we are using a numerical code (CLASS) to compute the observables (here Pk). Those computations have soem accuracy ( cutoffs, binnings etc.) which leads to some numerical noise: the derivatives with respect to the cosmological parameters (ie. d_Pk/d_omega_b) do not vary very smoothly (as with an anlytical function) and any minimizer does hate that. We have somewhat tuned the minimization stategy with MINUIT to survive to these effects but still sometimes the minimizer finds a better route to the very precise minimum. In real life estimation we often shoot a set of randomly shaked starting values (within the bounds of the parFile): the receipe is then simple: always take the solution with the lowest chi2_min value. (there are some scripts in camel to automatize this process).

1.6  Precision

Recall that CLASS is solving numerically some complicated equations. One can control the level of accuracy of many steps in the code. What is nice in CLASS is that all the precision parameters (cutoffs,steps etc.) can be specified in a single place. CLASS has some reasonable values but you may need to adapt them if your need some really high precision. Of course this will take more time so it is always a compromise between speed and precision.

When we try to minimize a multidimensional function the algorithms assume that the function is smooth wrt the parameters. It is not the case here since within CLASS the "error" and "bounds" in finite precision of the computations brings up numerical noise because chi2(pars)=chi2(Pk(pars)) where the Pk operation is performed by CLASS. This complexify the minimizer's task of precisely finding the minimum.

CAMEL provides some higher-precision parameters to CLASS, still within reasonable computation time. Add the following line to your parFile

precisionFile=class_pre/hpjul2.pre

(you may look at what is in this file, recall relative paths are wrt to the /lik directory)

Re-run and see if there is some difference.

Now release all the parameters (ie. modify the "fix" lines by "par" ones) and look for the global best-fit. You may first keep the nuisance fixed (to 1) then release it. You may try to figure out the MINUIT strategy by looking at the output. Does the number of calls before convergence scale as the dimensionality of the minimum?

1.7  The Hessian matrix

You notice as you run that MINUIT finally gives you some "Values" and "Errors" for the released parameters.

While the Values are really those that produced the final Chi2 value, the "errors" are much more uncertain. They (should) represent the curvature of the function at the minimum (the so called "Hessian"). However

  • because of numerical noise they are rather uncertain
  • they represent an estimate of the errors (in a frequentist sense) only if the problem is linear, which is rarely the case.

However we will see in the 2d session, that they can still be interesting, so once you are happy with your "best-fit" keep the "hess.txt" file in a safe place (or rename it).

Let's read it in python ( loadtxt is still your friend). Here you get the covariances between parameters: transform them into correlations and print them (optionally plot if you know how to use python for that)

For convenience here is a python function that will nicely plot the array:

from pylab import *
def plotcor(cor,par):

   covmat = np.tril(cor, k=-1)
   covmat[covmat==0] = NaN

   imshow(covmat,vmin=-1,vmax=1,interpolation='none')
   colorbar()

   yticks(range(len(cor)),par)
   xticks( range(len(covmat)),par)

   for (j,i),label in ndenumerate(covmat):
       if i<j :
          text(i,j,'{:2.3f}'.format(label),ha='center',va='center')
   show()

why are there so many blank?

If you want to know more about the original discussions on the meaning of "errors" in MINUIT see James(1980)

1.8  Are you frequentist or Bayesian?

The MLE gives you a solution in a multi-dimensional space which is not very convenient to visualize. What can we say about individual parameters? If the Hessian is not reliable what should we do?

Up to now, we were only dealing with classical statistics, but now you have to choose your religion and decide what a probability means! It's time for you to answer this very disturbing question: '''are you a Bayesian or a frequentist?

  • If you consider that reality ("true values") can never be captured and prefer to give statements about data intervals overlapping this true value when repeating the experiment, you are a frequentist
  • if you consider that probabilities do not exist by themselves but represent some degree of belief human beings have and that get updated with new experiments, then you are a Bayesian.

Certainly the most sane approach for a scientist is to be agnostic and compare both approaches (which, when the problem is well constrained give in practice similar results).

1.9  Profile-likelihood

This method is the frequentist answer to the interval problem. Rather than writing a lot of theory, let see in practice how it works. We choose one variable, here the bias and we will profile it:

  • fix the bias to your preferred value and leave the other parameters free (ie use the "par" keywords)
  • minimize
  • store the bias value and the chi2 found (in 2 columns) in some text file (on your computer if you have python)
  • vary a bit the bias and redo the exercise

NB: here "a bit" means that you are looking only to chi2 values that fluctuates within a few units (let's say ~5) wrt to the lowest chi2 value

  • so build a few points trying to scan around the minimum.
  • Then read the two columns (python) and plot the chi2 column vs. the bias one: what do you observe? (You may have some "spikes" corresponding to improper minimization, try redoing them changing a bit the starting values and recall the lowest chi2 always win!).
  • a 68% confidence level interval is finally obtained by thresholding this curve at its minimum+1 (or 3.84 for a 95% CL interval)

Think about it: where is the minimum located wrt to the global-best fit (ie. when all the parameters free)?

Finally write your statement about the bias that was present in those data , using the "+-" notation. (then ask God for what She/He put in the simulation)

If you have time, try the same with H0.

2.  Session 2 : MCMC

2.1  New session

remember to do:

source euclid_setup.sh
cd CAMEL/HEAD/cmt
source camel_setup.sh
If you plan to analyze the (text) output files that will be produced locally on your laptop (recommended) after recopying them, also copy locally the $CAMELROOT/work/tools/python/camel.py python lib. Put it in the directory where you will analyze data (or somewhere pointed to by your PYTHONPATH)

2.2  MCMC for real

Monte Carlo Markov Chains are popular in cosmology. Although it may look as an evolved technique it is in fact mathematically a very simple one (the Metropolis-Hastings algorithm, father of all, takes about 10 lines to code, you should try). But anyone who practiced them in real life conditions (ie data) knows that a lot of cooking (on CPU, proposal matrix, convergence, priors) is generally forgotten/hidden. So let see in practice how it works.

  • Monte-Carlo : means that it is a procedure to draw samples according to a distribution ("draw" is the same meaning than when you draw samples from a gaussian as randn in python)
  • Markov Chain: is an ordered set of samples (some would say a (n)tuple) where each sample ( vector) depends only on the previous one.

The MCMC is therefore only a method to shoot samples from some (complicated) multi-dimensional function. It is not especially "Bayesian" (it can be used to compute integrals) and is quite inefficient. However with some practice you can manage to sample about everything.

Here is how Metroplis-Hastings (MH) works in practice to sample a function F around its maximum :

  1. start with some values for (x1,x2,...,xN)
  2. shoot a random increment (d1,d2,...,dN) according to generally a multivariate Gaussian matrix (the so called "proposal")
  3. is F(x1+d1,x2+d2,...,xN+dN) > F(x1,x2,...,xN)?
    1. yes: "go" there ie. x1=x1+d1, x2=x2+d2,....,xN=xn+dn
    2. no : you may go there but not sure (you do another random shot with a uniform distribution "u" and go there (ie. x1=x1+d1, x2=x2+d2,....,xN=xn+dn only if u<F(x1+dx1,x2+dx2,...,xN+dxN)/F(x1,x2,...,xN))
  4. print the chi2 value and F(x1,x2,...,xN) into a file
  5. goto 2

Run this for "some" time, then read the file and do histograms.

Isn't that simple?

However:

  • what does "some time" mean?
  • what to use for the proposal? if the increment is too large the test 3.2 will always fail once you are close to the maximum so that you recopy again and again the same value into the file. If it's too low the test will mostly succeed and the sampling performed a bit everywhere (so why not scan a grid of all values instead?): it will be very long to scan all dimensions... What correlations to put in the "proposal" is another important question.

Both these points are very intricate: if you have a "good" proposal you can run the chain for a shorter time. In fact theory says the best proposal is to input the "real" covariance of your parameters: you can determine it from the samples themselves but only if your samples have a good proposal! In MH, people try to make some "guess" for this covariance matrix, make some (very long) run, estimate empirically the covariances using these samples, input this matrix as a new proposal, and redo some run(s). In cosmology, it can take more than one week.

In CAMEL we did implement some adaptive strategy to reconstruct a covariance matrix on the fly. It is not completely trivial and one should care not destroying the Markovian nature of the chain: samples do not depend only on the previous one but here also on the previous proposal (in practice the "adaptation" of the proposal is gradually decreased until it gets "frozen". We will learn how to use that.

2.3  Bayesian inference

So what does this has to do with cosmological parameter estimation?

Here the sampled function F is simply the data likelihood. According to Bayes theorem

f(params|data)=f(data|params)* prior(params)

(there is a denominator, but it's however unimportant for parameter estimation)

  • f(params|data) is the parmeters posterior distribution
  • f(data|params)=likelihood(params)
  • prior(params) is your a-apriori degree of belief about the parameters

Unfortunately most people forget about the priors and more or less implicitly put them to 1 arguing this is the most uninformative one (that's wrong see Jeffreys priors). So that finally in most cases, one uses elaborate statistical terms to justify laziness to pretend that the posterior distribution is equal to the likelihood (which is again wrong, think if something "likely" is the same than something "probable"). A serious Bayesian approach is to figure out how your result varies depending on your priors parametrization. An even simpler one is to compare the output of this method to the frequentist profile-likelihood one which does not have any build-in notion of prior.

2.4  Adaptive Metropolis (AM)

So let us see how to run the AM algorithm in CAMEL the lazy way (flat priors actually taken from the bounds in your parFile).

  • Use your previous lcdm.par parFile (you may recopy as lcdm_mc.par to not loose your previous setup)
  • use the best-fit parameters you obtained in session1 to get better starting values in the file
  • comment out the "precisionFile" line: why don't we need it?
  • add the following parameters
algo=ada
length=1000
bunchSize=100

It means we use the AM algorithm, to generate 1000 samples(vectors) that will be written into the file every 100 steps.

Then we need some rough estimate of the proposal matrix: why not use the Hessian build in session1? Assuming your Hessian file is called "hessian.txt" then add

proposal_cov=hessian.txt

Finally AM has a number of parameters to tune the run. try:

t0=100
ts=0
do_move=true
scale=1

There are actually 2 things the algorithm is adapting from the data

  1. the covariance matrix
  2. the scaling (a multiplicative factor) of this matrix
  • in a first stage (until t0 samples) we accumulate data using the given proposal and the scale factor is fixed to "scale". If you believe that your Hessian is not too bad you may try putting 1 (otherwsise use a lower value)
  • then the scale factor is fixed (for ts sampples) to a "resonnable" value in order to adapt only the covariance matrix (here we don't use that feature, ts=0)
  • finally , after t0+ts samples, the scaling is adapted to reach asymptotically an acceptance rate of 0.25 (a magic number we will see later)

"do_move" allows to jump randomly the first sample, so that each run will start with a different value. Note that the seed of the random number generator changes for each run.

Then run

..../mcmc lcdm_mc.par chain1.txt

This should last about 3mins. If you are ensure that everything is fine, you can check regularly the number of lines written in chain1.txt (which should increase with time by chuncks of 100)

wc -l chain1.txt

2.5  Checking for convergence

In order to sample correctly the distribution, the samples distribution must be in a stationary state known as correct "mixing". Then its distribution "converges" to the right function (posterior/likelihood here). This requires running the algorithm sometimes for a long time (steps) before it forgets its origin and go into a region of the parameter space where the proposal matrix is well adapted.

There is no way to be 100% sure a chain is mixing correctly, but you may often clearly see when it did not! Here are various ways:

Trace plots

The very first thing to do once you have your chain (file) is simply to plot each variable to see how it evolved (parameters+chi2 see 1st line header for the names) looking for the moment were the chain is in a somewhat "stable" regime (stationarity)

In python read the (text) file, for instance with var=loadtxt("chain1.txt",skiprows=1,unpack=True) (or genfromtxtx, pandas.read_csv ...) and plot each variable evolution.

Have all variables converged? In your view, when do the distributions begin to be stationary (if anywhere)?

With these plots (pedantically called "trace plots") you will identify 99% of the problems (not saying that the remaining 1% is OK)

Acceptance rate

Have a look at the chain1.txt values. What do you notice? The acceptance rate is a mean value of times the jump was accepted (so value changed in the file). A rule of thumb is that it should be close to 1/4 (meaning samples are repeated 3/4 of the times). It was established only for Gaussian distribution but in practice this rule also works well for other distributions. CAMEL computes a this running mean (every 100 steps) and writes it in the file (you should have) named ar_vs_length.txt You may open it and check values that should be around 0.25. This is not a proof the chain converged but is a good hint.

Finally if you wish to see how the scale factor was adapted the file scale_vs_length.txt keeps the values. You may look how it evolved in python.

2.6  Multiple chains

Once you are happy with your chain a finer insight can be gained running several chains and comparing their characteristics, so construct 4 independent chains

..../mcmc lcdm_mc.par chain2.txt
..../mcmc lcdm_mc.par chain3.txt
..../mcmc lcdm_mc.par chain4.txt

They should be different each time since you start from a different starting point (do_move=true) and the seed of the random number generator is (implicitely) different in each run.

We will now use some features of the camel.py library (that is appendend to your PYTHONPATH when you run camel_setup.sh , otherwise on your laptop check you have it in your PYTHONPATH, it is in ...CAMEL/HEAD/work/tools/python). So in python:

Compare how the chains evolved for in each run :

from pylab import *
import camel

#read the chains in a single list
chains=camel.extract_chains('chain',4)


#read the variables name from the 1st line of "chain1"
with open('chain1.txt', 'r') as f:
    names= f.readline().split()

#overplot the 4 chains for each variable
ivar=0
[plot(c[ivar]) for c in chains]
title(names[ivar])

#look at the other variables too

The Gelman-Rubin test

Now we can go on with a deeper convergence test. Its idea is, for each parameter, to compare the variance of each variables within each chain to the variance among the chains. This allows to build the "R" statistics and a good rule of thumb to see where the variable converged is to look for:

R-1<0.03

(or 0.01 if you are more demanding)

Fortunately camel.py does already everything for you:

#Compute Gelman-Rubin statistic for each parameter
it,R = camel.GelmanRubin(chains,gap=100,length_min =0)
figure()
plot(it,R-1.)
legend(names[1:])
axhline( 0.03, color='k', ls='--')
  • determine when (hopefully) all parameters are satisfactory

2.7  Building posteriors

getting mean values and intervals

Now we are more confident that the chains sample correctly the likelihood around its maximum after some number of steps (lets say N0), we construct one single chain with our "best samples" by concatenating the last part of each chain:

#define your N0 value
N0=

#construct a single chain from the last samples (using samples above N0 steps)
chain=camel.mergeMC("chain",num=(1,2,3,4),burnin=N0)

#names were defined previously - you can still use the following convenience
names=camel.MCparnames("chain")

#plot the histograms + basic statitics
# the first variable is "chi2", we don't want it
for ivar in range(1,5):
     subplot(2,2,ivar)
     parname=names[ivar]
     hist(chain[parname])
     title(parname)
     print "%15s " % parname, "\t %f \t %f \t %f" % camel.getci( np.percentile(chain[parname], [16,50,84]))

tight_layout()

Now show God your result and ask Him for what She/He put in the simulation.

Triangle plot

One can produce the famous "triangle plot" which not only shows the (smoothed) 1D histograms but 2D's too. camel.py contains some quick functions for doing that

#names defined previously
#chain is from mergeMC
camel.triangle(chain,params=names[1:],smooth=5)

Here we are short on statistics (in real life one runs for much longer on batch workers). We can also use the GetDist python package (so if you work on your laptop be sure it is installed or do pip install getdist)

Now starting again from "chain" as the mergedMC values and names as the column names:


import camel
from getdist import plots, MCSamples

#see previously how chain/par are defined

#build samples without chi2
mcmc = [chain[par] for par in names[1:]]
#the follwing translates to some nice fonts for param names
labels=[camel.parname.setdefault(n,n).replace('$','') for n in names[1:]]

#getDist functions:
samples=[MCSamples( samples=mcmc, names=names[1:],labels=labels)]
g = plots.getPlotter()                # creates the "plotting" object

g.triangle_plot(samples,names[1:],contour_colors='blue',filled=True)

2.8  Closing the loop

  • Now that you have 2 estimates of the "bias" parameter, one from the profile likelihood, the other one from MCMC Bayesian sampling: compare both (think about a proper representation)
  • now you have valid samples of the posterior, you can compute the empirical covariance (which is a generalization of the statistical variance in 1D): good news, it is implemented in python (numpy) as "cov":

construct the empirical covariance and compare it to the Hessian.

3.  Want more?

From the camel documentation construct a parFile including both the latest BOSS-DR12 and JLA (SN1a) measurements and try to find the best-fit. Note that in this real life scenario you may need to

  • explicit each of the 6 LCDM parameter (to avoid implicit values)
  • add some massive neutrino (0.06 eV is the lower bound)
  • use halofit for the non-linear part of the spectrum.

Running interactively with all parameters free is probably irrealistic but you may release a few ones (the full minmization should be run in batch mode)

Now you are a real expert, time to have a look at this more complete tutorial