Likelihood-based statistics

1.  Model+parameters

We will learn how to write a parameter file ("parFile") which is the main input to CAMEL.

The very first questions one must ask are

  1. what cosmological model do we want to confront to the data?
  2. what are the parameters of this model?
  3. what are the data?

Here we will use the popular (flat) Lambda-CDM one. How many free parameters does it contain?

In the following for the data we will use a (fake) matter power spectrum measurement (embedded in a likelihood) and work with 3 parameters only (for CPU reasons) ie. we assume the other parameters are known and fixed. In linear theory, the matter power-spectrum shape is most sensitive to the Hubble constant, and matter densities where one should distinguish between the standard "baryonic" one and the "cold dark matter" contribution.

  • find their corresponding names in the CLASS package
  • what is the difference between "Omega"'s and "omega"'s? which one is better to use?
  • what are todays estimates for the 3 parameters ?(these kind of values were put in the sim)

In the following we will call your choice of parameters "par1,par2,par3" and the first values as "guess1,guess2,guess3"

Instructions

  • go to the work directory
  • create a directory named output and go into it. DOCKER USER: this directory already exists (and is shared) so you do not need to create it.

Edit there with your preferred editor a new file called for instance lcdm.par

and complete it with your choices:

fix par1 cosmo guess1
fix par2 cosmo guess2
fix par3 cosmo guess3

class	sBBN\ file	bbn/sBBN.dat
  • The last line is here to indicate to CLASS where it must search for the file which includes nucleo-synthesis data (used to determine the n_b/n_gamma ratio). The "\" sign is here to warn that the following space is part of the same word.

The starting keyword "class" can be used to transmit parameters to CLASS (similar to explanatory.ini but without the = sign).

  • you may wonder what "bbn/" refers to: In CAMEL all relative paths refer to the $CAMELROOT/lik directory (check bbn/ indeed exists)

Validation:the writeSpectraPk executable allows to run class and produce a fits file with several spectra. It is a good way to check that your parFile is OK (it is similar to running the "class" executable but using a CAMEL parFile). DOCKER USER: run the container with the run.sh script, then go into work/output where you should find you paramterfile. work from there

All executables are located in what we will call the EXECDIR and corrresponds to ../../Linux-x86_64 (from output)
$EXECDIR/writeSpectraPk lcdm.par 2000 output.fits 0.5

The third argument is lmax (here 2000) for Cl(CMB), the next is the name for the fits output file, and the last one (here 0.5) is the redshif (z) at which the matter spectrum is computed.

You may now have a look at the Pk spectrum which is in the 3d header of the output.fits file. Here is how to read a FITS file in python ( ipython --pylab)

from astropy.io import fits

h= fits.open("output.fits")
pkdata=h[3].data

k=pkdata['k']
pk=pkdata['pklin']

loglog(k,pk)

2.  Likelihood

A likelihood is a real function of a set of parameters "pars" (a vector, here (par1,par2,par3)) that returns a high value when the model matches the data and low values otherwise. If correctly written (a big job in real life) it contains all the information concerning observables. We will equally use the term "chi2" which by definition is chi2(pars)=-2 ln[L(pars)] so that a "high likelihood" means a "low chi2".

An essential piece of information is the value of pars for which L(pars) is maximal or chi2(pars) minimal. This is the Maximum Likelihood Estimator (MLE) and is most often the best one can do to infer which set of pars describes best the data.

We work with a some very basic likelihood, which assumes we did measure a set of "Pk" measuremnts at redshifts of z=0.5,1,1.5 over some k range and we know exactly the associated Gaussian errors (this is a very over-simplified example but it will really happen soon with the EUCLID/LSST missions, with a much higher precision than what we are playing with today). We also add another likelihood related to the BAO peak position.

The "Pk" likelihoods value (or chi2, a single number) of a set of cosmological parameters pars=(par1,par2,par3) is in fact computed in two steps:

  1. from a given set of parameters (pars) first CLASS computes the corresponding P(k,z_j) specta (z_j=0.5,1,1.5)
  2. Then the Chi2 values of this model w.r.t to each measurement are summed up

chi2(pars,z_j)=sum_i [(Pk(i,z_j)-Pk_data(i,z_j)/sigmaPk_data(i,z_j)]**2 where i runs over the measured "k" values and "j" the redshifts.

InstructionsTo use these likelihoods add the following lines to lcdm.par:

 
fakePk1=fakepk/pk_0.5.lik
fakePk2=fakepk/pk_1.lik
fakePk3=fakepk/pk_1.5.lik

fakeBAO=fake/BAO_scale.lik
  • The first line gives the relative path of the likelihood file (wrt to "lik" directory) : go in there and have a look at the lik file for z=0.5
  • then read the corresponding data file (python: you may use np.loadtxt) and plot the values with errors (python: plt.errorbar). What is the maximum value of k ?
  • reload your previous output.fits file and superimpose the spectrum. Does both fit (reasonably) well?
  • read the data files for z=1 and z=1.5: how do spectra compare? why?

Then we add to lcdm.par some CLASS directives to compute the spectrum only at the right redshifts (the default being z=0). For optimization with this likelihood we also don't need CLASS to compute spectra above some k_max value right? Then complete X,Y,Z (which is a comma-separated list) and W in the following CLASS-specific parameters:

class z_pk	X,Y,Z
class P_k_max_1/Mpc	W

To check that the parFile is correctly configured we then run the writeChi2 executable on it, which simply prints the chi2 value corresponding to your fixed parameters values. Try and see what happens...

 
$EXECDIR/writeChi2 lcdm.par

The (galactic) bias is a nuisance parameter, that comes from the fact that one does not measure directly the full matter power spectrum but the power spectrum inferred from some population of galaxies. It is not well known (and actually varies with the galaxy selection and redshift, here for simplicity we assume it is the same in all the likelihoods) There are generally much more nuisance parameters qualifying for instance possible systematics in the measurement: they impact directly the likelihood (not the cosmological model) and can have some external constraints depending on independent measurements/a-priori knowledge. So finally what is called "parameters" is a set of (cosmos,nuisances) ones. The likelihood actually depends on them through :

L(Pk(cosmos),nuisances)

'''Complete the parFile adding the nuisance parameter (in this case instead of "cosmo", use the keyword "nui"). For the moment we will fix its value to 1.07, so add

fix	    bias		nui	    1.07 

Rerun writeChi2 with this bias parameter set. Note the chi2 value. Is this a "good" one? (justify)

3.  The Maximum Likelihood Estimator, a.k.a. "best-fit"

Given some measurement(s), we are looking for the set of parameters (within a model) that is the most likely to produce those data (note the "likely" semantic, it is not the same than "probable"). Technically this is performed by looking for the parameters that minimize the chi2(pars) function (or maximizes the likelihood). "pars" is a vector that includes both cosmological and all the nuisance parameters. Here it has low dimensionality but it is frequent to go up to 10, 30 or 100 parameters. So we are dealing with the minimization of a multi-dimensional function. There are several algorithms to do that but there is no magical one (unless your likelihood is linear which is rarely the case). Furthermore we don't have access here to the analytical gradients of the function which complicates the work. We use in CAMEL the MINUIT package which has been widely used in High Energy Physics. However remember: no magic, you always have to experiment with such tools to ensure your really captured the minimum chi2 solution. This means trying different starting points and check that the solution is not blocked at some boundary for example.

So far our parameters were fixed. Let us release one. Modify one of your parameter adding a (rough) estimate of its error and some bounds within which it should be varied
Technically this amounts to modify the "fix" keyword by a "par" one and complete the following format:

parnamecosmo(or nui)valueerrorlower_boundupper_boud

with guesses for the "error" and "bounds" (use spaces or tabs to separate the fields)

Also to see what is happening during the minimization add to your parFile

verbose=true

and run with

$EXECDIR/Minimize lcdm.par bf.txt hess.txt

(the two output arguments are the best-fit/hessian outputs written)

What is shown is then the values that are being tried by MINUIT: the last 2 columns are of interest also: they gives respectively the chi2 value for this set of parameters and the CLASS time computation of the model (in s).

Don't be worried at the end by this kind of message:

WARNING: FunctionMinimum is invalid.

which is hard to avoid because of the numerical noise . The minimum can be actually valid (but maybe not the estimate of its error, see the Hessian part).

Modify slightly the input value and rerun. Do you obtain the same minimum? Which one is better?

4.  Precision

Recall that CLASS is solving numerically some complicated equations. One can control the level of accuracy of many steps in the code. What is nice in CLASS is that all the precision parameters (cutoffs,steps etc.) can be specified in a single place. CLASS has some reasonable values but you may need to adapt them if your need some really high precision. Of course this will take more time so it is always a compromise between speed and precision.

When we try to minimize a multidimensional function the algorithms assume that the function is smooth wrt the parameters. It is not the case here since within CLASS the "error" and "bounds" in finite precision of the computations brings up numerical noise because chi2(pars)=chi2(Pk(pars)) where the Pk operation is performed by CLASS. This complexify the minimizer's task of precisely finding the minimum.

CAMEL provides some higher-precision parameters to CLASS, still within reasonable computation time.

InstructionsAdd the following line to your parFile

precisionFile=class_pre/hpjul2.pre

(you may look at what is in this file, recall relative paths are wrt to the /lik directory)

Re-run with the one parameter-free and see if there is some difference.

Now release all the parameters but the bias one (ie. modify the "fix" lines by "par" ones) and look for the global best-fit. You may try to figure out the MINUIT strategy by looking at the output.

Try changing a bit the starting values and see if you get the same result: which one is better?

When you have a reasonable first guess try releasing also the "bias" parameter (but do not spend too much time if it hardly converges).

5.  Profile-likelihoods

The MLE (which is classical statistics) gives you a solution in a multi-dimensional space which is not very convenient to visualize. We would like to make statements about each variable separately What can we say about individual parameters? But then you have to choose your religion and decide what a probability means! It's time to ask yourself '''are you a Bayesian or a frequentist?

  • If you consider that reality ("true values") is not accessible and one should do probabilitic statements on the data only, you are a frequentist
  • if you consider that probabilities do not really exist by themselves but represent a degree of belief human beings have then you are a Bayesian.

Certainly the most sane way for a scientist is to be agnostic and compare both approaches (which, when the problem is well constrained give in practice similar results).

Profile-likleihoods are the frequentist solution (MCMC will give a Bayesian answer) to estimating intervals for variables. Rather than writing a lot of theory, let see in practice how it works.

We choose one variable, here H0 and we will profile it:

  • fix H0 to the MLE and leave the other parameters free (ie use the "par" keywords): to begin with keep also "bias" fixed to 1.07.
  • minimize
  • copy this value of H0 and the chi2_min found in a text file
  • move a bit H0 and redo a minmization : add the new (H0,chi2_min) values to the previous file (2 columns). Do this for a few other points trying to be close (ie within a few units) to the lowest chi2min value.
  • Then read the two columns from your file: H0 and chi2min
  • offset chi2min by a constant so that the smallest value is 0: we call the output dchi2
  • plot H0 vs dchi2. what do you observe?
  • a 68% confidence level interval is finally obtained by thresholding this curve at its 1 (or 3.84 for a 95% CL interval)

Think about it: where is the minimum located wrt to the global-best fit (ie. the MLE, when all the parameters free)?

Sometimes starting from a slightly different point leads to a slighty better chi2 and you should use this one. Here it may be a bit cumbersome because you are doing it by hand. In real life, there are some tools to send several minmizations (with differemt starting point values) to a cluster and take the best chi2, but it is outside the scope of this school. You may look at a more real-life case here (see Fig.4).

Finally write your statement about H0 , using the "+-" notation.

If you have some time left you may

  • release the bias parameter
  • profile another variable.
  • see next point

6.  The Hessian matrix

You notice as you run that MINUIT finally gives you some "Values" and "Errors" for the released parameters.

InstructionsRelease again all parameters (ie. we come back to the MLE case), start with the best values you obatined with the profile likelihood, and add the following line to lcdm.par:

doHesse=true

(which will compute a slightly more precise Hessian matrix)

Then run again a minimization.

$EXECDIR/Minimize lcdm.par bf hess.txt

Are the printed errors realistic? For H0 do you have something similar to what you obtained with the profile likelihood?

While the Values of the Chi2 are quite precise the "Errors" quoted by MINUIT are much more uncertain. They (should) represent the curvature of the function at the minimum (the so called "Hessian"). However

  • because of numerical noise they are rather uncertain
  • they represent an estimate of the errors (in a frequentist sense) only if the problem is linear, which is rarely the case.

However we will see in the 2d session, that the Hessian, although uncertain can still be interesting.

Let's read the Hessian matrix hess.txt (python: ( loadtxt is still your friend). Plot the matrix (python: matshow) which represnts covariance between variables.

  • why is the matrix symmetric?
  • what is on the diagonal?

Because of the different range of all the variables, this matrix is by eye not very intresting: transform it into a correlation matrix \frac{cov(x_i,x_j}{\sigma_i \sigma_j}

and look at it.

back