Likelihood-based statistics

1.  Working environment

  • school VB image : open a terminal and go to EDE/camel/CAMEL/v3_euclid
  • docker: see Main.Euclid-docker (special instructions in the following will be noted by DOCKER)

2.  Model+parameters

We adress the statistical problem of parameters inference (which is different from model testing): asumming a model is correct what can we say about its parameters given some measurements ("data") ?

We will use for that the CAMEL software that allows to perform parameter estimates in both the frequentist (profile-likelihoods) and Bayesian (MCMC) approaches. This gives some idea about the influence of a statistical method used to analyse data. Note that the example here is simple and parameters well constrained, so both methods give similar results. But this may not be the case if you work on more general models (as modified GR). Think about using CAMEL if you need some confidence about your statistical method (as checking the influence of priors in MCMC approach).

We will first learn how to write a parameter file ("parFile") which is a CAMEL (text) file that describes your inputs.

The very first questions you should ask are:

  1. what is the cosmological model we want to study?
  2. what are the parameters of this model?
  3. what are the data (measurement) ?

Here we will work on the popular (flat) Lambda-CDM model: How many free parameters does it contain?

In the following for CPU reasons, we will focus only on 3 ones, ie. we assume the others are fixed to some default values.

"Data" will be represented by a (fake) power spectrum measurement at different redshifts and some BAO scale.

In linear theory, the matter power-spectrum shape is mostly sensitive to the Hubble constant, and matter densities where one should distinguish between the standard "baryonic" one and the "cold dark matter" contributions. which behave very differently

  • find their corresponding names in the CLASS package
  • '''what is the difference between "Omega"'s and "omega"'s?
  • what are todays order of magnitude for these ?

In the following we will call your choice of parameters "par1,par2,par3" and the order of magnitude guess as "guess1,guess2,guess3"

Instructions

  • go to the work directory
  • create a directory named output and go into it. DOCKER: this directory already exists (and is shared) so you do not need to create it.

Edit there with your preferred editor a new file called for instance lcdm.par

and complete it with your choices:

fix par1 cosmo guess1
fix par2 cosmo guess2
fix par3 cosmo guess3

class	sBBN\ file	bbn/sBBN.dat

precisionFile=class_pre/hpjul2.pre
  • the sBBN line is here to indicate to CLASS where it must search for the file which includes nucleo-synthesis data (used to determine the n_b/n_gamma ratio). The "\" sign is here to warn that the following space is part of the same word.

The starting keyword "class" can be used to transmit parameters to CLASS (similar to explanatory.ini but without the = sign).

  • the precisionFile is here to tell CLASS the level of precision the numerical computations should be done. We need some quite high level when computing minimization (because otherwise the minimizer would have problemis converging). This of course takes a bit more time than using default parameters but is still reasonable.

Validation:the writeSpectraPk executable allows to run class and produce a fits file with several spectra. It is a good way to check that your parFile is OK (it is similar to running the "class" executable but using a CAMEL parFile). DOCKER: go into virtual env with the run.sh script, then cd into work/output where you should find you paramterfile. work from there

All executables are in EXECDIR which corrresponds to the Linux-x86_64 directory (../.. from output)

$EXECDIR/writeSpectraPk lcdm.par 2000 output.fits 0.5

The third argument is lmax (here 2000) for Cl(CMB), the next is the name for the fits output file, and the last one (here 0.5) is the redshif (z) at which the matter spectrum is computed.

You may now have a look at the Pk spectrum (which is in the 3d header of the output.fits file). You can do this in python. If you are not familiar here is how to (but better if you do it yourself). DOCKER: use your own python software since you can access the output shared area.

does the spectrum looks "OK"?

3.  Likelihood

A likelihood is a real-valued function of the parameters(a vector, here (par1,par2,par3)) that returns a high value when the model matches the data and low values otherwise. If correctly written (a big job in real life) it contains all the information concerning observables. We will equally use the term "chi2" which by definition is

 chi2(pars)=-2 ln[L(pars)] 

so that a "high likelihood" means a "low chi2".

An essential piece of information is the value of pars for which L(pars) is maximal or chi2(pars) minimal. This is the Maximum Likelihood Estimator (MLE) and is most often the best one can do to infer which set of pars describes best the data.

We work with a some very basic likelihood, which assumes we did measure a set of "Pk" measuremnts at redshifts of z=0.5,1,1.5 over some k range and we know exactly the associated Gaussian errors (this is a very over-simplified example but it will really happen soon with the EUCLID/LSST missions, with a much higher precision than what we are playing with today). We also add another likelihood related to the BAO peak position.

The "Pk" likelihoods value (or chi2, a single number) of a set of cosmological parameters pars=(par1,par2,par3) is in fact computed in two steps:

  1. from a given set of parameters (pars) first CLASS computes the corresponding P(k,z_j) specta (z_j=0.5,1,1.5)
  2. Then the Chi2 values of this model w.r.t to each measurement are summed up

chi2(pars,z_j)=sum_i [(Pk(i,z_j)-Pk_data(i,z_j)/sigmaPk_data(i,z_j)]**2 where i runs over the measured "k" values and "j" the redshifts.

InstructionsTo use these likelihoods add the following lines to lcdm.par:

 
fakePk1=fake/pk_0.5.lik
fakePk2=fake/pk_1.lik
fakePk3=fake/pk_1.5.lik

fakeBAO=fake/BAO_scale.lik
  • The first line gives the relative path of the likelihood file (wrt to "lik" directory)
  • read and plot the data for z=0.5 (if you are unfamiliar with python see the python howtos)

(python: you may use numpy.loadtxt and plt.errorbar). DOCKER: you should recopy the file to the shared area to access it on your computer ).What is the maximum value of k ?

  • reload your previous output.fits file and superimpose the spectrum. Does both fit (reasonably) well?
  • read the data files for z=1 and z=1.5: how do spectra compare? why?

Then we add to lcdm.par some CLASS directives to compute the spectrum only at the right redshifts (the default being z=0). For optimization with this likelihood we also don't need CLASS to compute spectra above some k_max value right? Then complete X,Y,Z (which is a comma-separated list without spaces) and W in the following CLASS-specific parameters:

class z_pk	X,Y,Z
class P_k_max_1/Mpc	W

To check that the parFile is correctly configured we then run the writeChi2 executable on it, which simply prints the chi2 value corresponding to your fixed parameters values. Try running the executable: does it work?

 
$EXECDIR/writeChi2 lcdm.par

What is happening (see the log messages) is that the likelihood is lacking a parameter, the (galactic) bias here. It is not a cosmological one but a nuisance parameter, that comes from the fact that one does not measure directly the full matter power spectrum but the power spectrum inferred from some population of galaxies. It is not a well known value (and actually varies with the galaxy selection and redshift range, here for simplicity we assume it is the same in all the likelihoods) There are generally many nuisance parameters qualifying for instance possible systematic errors in the measurement (values we are not very sure about). They impact directly the likelihood (not the cosmological model) and should be fitted together with cosmological parameter (whose precision will be decreased when adding them)

So finally what is called "parameters" is a set of (cosmos,nuisances) ones. The likelihood actually depends on them through :

L(Pk(cosmos),nuisances)

Complete the parFile adding the nuisance parameter (in this case instead of "cosmo", use the keyword "nui"). For the moment we will fix its value to 1.07, so add

fix	    bias		nui	    1.07 

Rerun writeChi2 with this bias parameter set. Note the chi2 value. Is this a "good" one? (justify)

4.  The Maximum Likelihood Estimator, a.k.a. "best-fit"

Given some measurement(s), we are looking for the set of parameters (within a model) that is the most likely to produce those data (note the "likely" semantic, it is not the same than "probable" a confusion often done by Bayesians). Technically this is performed by looking for the parameters that minimize the chi2(pars) function (or maximizes the likelihood). "pars" is a vector that includes both cosmological and all the nuisance parameters. Here it has low dimensionality but it is frequent to go up to 10, 30 or 100 parameters. So we are dealing with the minimization of a multi-dimensional function. There are several algorithms to do that but there is no magical one (unless your likelihood is linear which is rarely the case). Furthermore we don't have access here to the analytical gradients of the function which complicates the convergence. We use in CAMEL the MINUIT package which is widely used in High Energy Physics. However remember: no magic, you always have to experiment with such tools to ensure your really captured the minimum chi2 solution. This means trying different starting points and check that the solution is not blocked at some boundary for example.

So far our parameters were fixed. Let us release one. Modify one of your parameter adding a (rough) estimate of its error and some bounds within which it should be varied
Technically this amounts to modify the "fix" keyword by a "par" one and complete the following format:

parnamecosmo(or nui)valueerrorlower_boundupper_boud

with guesses for the "error" and "bounds" (use spaces or tabs to separate the fields)

Also to see what is happening during the minimization add to your parFile

verbose=true

and run with

$EXECDIR/Minimize lcdm.par bf.txt hess.txt

(the last two arguments are some outputs for the best-fit/hessian that will be written)

What is then shown are values that are being tried by MINUIT: the last 2 columns are of interest also: they gives respectively the chi2 value for this set of parameters and the time computation of the model (in s).

Don't be worried at the end by this kind of message:

WARNING: FunctionMinimum is invalid.

which is hard to avoid because of the numerical noise. The minimum can be actually valid (but maybe not the estimate of its error, see the last part).

At the end you need to check the values under:

 # ext. ||   Name    ||   type  ||     Value     ||  Error +/-

did the parameter reach a boundary? does the error seem correct?

Modify slightly the starting values (or errors) and rerun. Do you obtain the same minimum? Why? Which one is better?

Now release all the parameters but the bias one (ie. modify the "fix" lines by "par" ones) and look for the global best-fit. This is essentially a try+guess work (on batch centers this is performed more automatically), here a few recommendations:

  • concerning "errors" you may try values around 5-10% of the central one (these spectra are most sensitive to the baryonic density).
  • Remember you final parameters should never stick to one of the boundary
  • Try changing a bit the starting values and see if you get the same result

When you have a reasonable first guess of the 3 parameters you may try releasing also the "bias" parameter (but do not spend too much time if it hardly converges, rather go on next).

5.  Profile-likelihoods

The MLE (which is classical statistics) gives you a solution in a multi-dimensional space which is not very convenient to visualize. We would like to make statements about each variable separately What can we say about individual parameters? Only here do you have to choose your religion and decide what (philosophically) a probability means! It's time to ask yourself '''are you a Bayesian or a frequentist?

  • If you consider that "reality" (true parameters) cannot be directly accessed but can only be approached by some repeated measurements, then you are a frequentist.
  • if you consider that probabilities rather represent some human degree of belief for some phenonmena (so that true parameters only live in your mind) , then you are a Bayesian.

Certainly the most sane way for a scientist is to be agnostic and compare both approaches (which, when the problem is well constrained as here gives in practice similar results but it is not always the case).

Profile-likelihoods are the frequentist solution (MCMC sampling is the Bayesian answer) to estimating intervals for variables. Rather than writing a lot of theory, let see in practice how it works.

We choose one variable, here H0 and we will profile it:

  • fix H0 to the MLE and leave the other parameters free (ie use the "par" keywords): to begin with keep also "bias" fixed to 1.07.
  • minimize
  • copy this value of H0 and the chi2_min found in a text file
  • move a bit H0 and redo a minmization : add the new (H0,chi2_min) values to the previous file (2 columns). Do this for a few other points trying to be close (ie within a few units) to the lowest chi2min value.
  • Then read the two columns from your file: H0 and chi2min
  • offset chi2min by a constant so that the smallest value is 0: we call the output dchi2
  • plot H0 vs dchi2 which is the profile-likelihood of H0 (you may add some points to get the shape up to dchi2~3 on both sides)
  • '''a 68% confidence level interval is obtained by cutting this curve at y=1

Think about it: where is the minimum located wrt to the global-best fit (ie. the MLE, when all the parameters free)?

Sometimes starting from a slightly different point leads to a slighty better chi2 and you should use this one. Here it may be a bit cumbersome because you are doing it by hand. In real life, there are some tools to send several minmizations (with differemt starting point values) to a cluster and take the best chi2, but it is outside the scope of this school. You may look at a more real-life case here (see Fig.4).

Finally write your statement about H0 , using the "+-" notation.

6.  Conclusions

The goal of this lecture is to show that minimizing a multi-dimensional likelihood to get parameter estimates is not always a trivial task because observables in cosmology are generally complicated functions of the parameters. This may require trying different "staring values" or "error estimated" for the fit, to find the best one (you should automatize this process in real life). You may have also to try to increase the precision of the computations in CLASS (there are many parameters in CLASS for that), since numerical noise may spoil the numerical convergence to the minimum.

Profile-likelihoods are a powerfull way of estimating ranges for individual parameters in a "frequentist" way. Note there is not such thing as "priors" here. So if you used an MCMC for your parameter estimation, getting a similar result with profile-likelihood will give you good confidence you are not actually pulling your results with the priors (even if you used implicitely 1 which is not non-informative). This is very valuable in particular when you work with models which are less constrained than LCDM (modified GR).

back