Installation
Using CAMEL
Tutorials

Likelihoodbased statisticsOn this page... (hide) 1. Model+parametersWe will learn how to write a parameter file ("parFile") which is the main input to CAMEL. The very first questions one must ask are
Here we will use the popular (flat) LambdaCDM one. How many free parameters does it contain? In the following for the data we will use a (fake) matter power spectrum measurement (embedded in a likelihood) and work with 3 parameters only (for CPU reasons) ie. we assume the other parameters are known and fixed. In linear theory, the matter powerspectrum shape is most sensitive to the Hubble constant, and matter densities where one should distinguish between the standard "baryonic" one and the "cold dark matter" contribution.
In the following we will call your choice of parameters "par1,par2,par3" and the first values as "guess1,guess2,guess3" Edit a new file called for instance and complete it with your choices: fix par1 cosmo guess1 fix par2 cosmo guess2 fix par3 cosmo guess3 class sBBN\ file bbn/sBBN.dat
The starting keyword "class" can be used to transmit any (valid) string parameter to CLASS (as in explanayory.ini).
Validation:
the Run the writeSpectraPk excutable which is located in what we refer to as EXECDIR and is different in docker and mardec. From the output directory:
$EXECDIR/writeSpectraPk lcdm.par 2000 output.fits 0.5 The third argument is lmax (here 2000) for Cl(CMB), the next is the name for the fits output file, and the last one (here 0.5) is the redshif (z) at which the matter spectrum is computed. You may now have a look at the Pk spectrum which is in the 3d header of the output.fits file. DOCKER users: remember it is in the output/ directory of where you ran the script Here is how to read a FITS file in python ( from astropy.io import fits h= fits.open("output.fits") pkdata=h[3].data k=pkdata['k'] pk=pkdata['pklin'] loglog(k,pk) DISCUSSION POINT:
2. LikelihoodA likelihood is a real function of a set of parameters "pars" (a vector, here (par1,par2,par3)) that is supposed to return a high value when the model (or associated observable) matches the data and low values otherwise. If correctly written (a big job in real life) it contains all the information concerning some measurements(s). We will equally use the term "chi2" which by definition is chi2(pars)=2 ln[L(pars)] so that a "high likelihood" means a "low chi2". An essential piece of information is the value of pars for which L(pars) is maximal or chi2(pars) minimal. This is the Maximum Likelihood Estimator (MLE) and is most often the best one can do to infer which set of pars describes best the data. It's interesting to work out (once in a life) some simple case: suppose you have a number of N (independent) samples that originated from a Gaussian of mean 0 and unknown variance : compute (by hand) the MLE of for stat lovers: what is the distribution of this estimator? is it biased? what is its variance? do the link with cosmology. DISCUSSION POINT:
We will now work with a very basic likelihood, which assumes we did measure a set of "Pk" measuremnts at redshifts of z=0.5,1,1.5 over some k range and we know exactly the associated Gaussian errors (this is a very oversimplified example but it will really happen soon with the EUCLID/LSST missions, with a much higher precision than what we are playing with today). Here, the likelihood value (or chi2, a single number) of a set of cosmological parameters pars=(par1,par2,par3) is in fact computed in two steps:
To use these likelihoods (that are by default summed up) add the following lines to your parFile fakePk1=fakepk/pk_0.5.lik fakePk2=fakepk/pk_1.lik fakePk3=fakepk/pk_1.5.lik
class z_pk XX class P_k_max_1/Mpc YY To check that the parFile is correctly configured we then run the $EXECDIR/writeChi2 lcdm.par 'What does happen? The (galactic) bias is a nuisance parameter, that comes from the fact that one does not measure directly the full matter power spectrum but the power spectrum inferred from some population of galaxies. It is not well known (and actually varies with the galaxy selection and redshift, here for simplicity we assume it is the same in all the likelihoods) There are generally much more nuisance parameters qualifying for instance possible systematics in the measurement: they impact directly the likelihood (not the cosmological model) and can have some external constraints depending on independent measurements/apriori knowledge. So finally what is called "parameters" is a set of (cosmos,nuisances) ones. The likelihood actually depends on them through :
'''Complete your parFile adding the nuisance parameter (in this case instead of "cosmo", use the keyword "nui").
But how to find a good fist guess for this parameter?
You may use for that the $EXECDIR/ScanParam lcdm.par bias 1 2 100 will vary the bias parameter linearly in the [1,2] interval with 100 steps. Look for the best chi2 and use the corresponding bias value (you may do a plot from scan_bias.txt) Rerun 3. The Maximum Likelihood Estimator, a.k.a. "bestfit"Given some measurement(s), we are looking for the set of parameters (within a model) that is the most likely to produce those data (note the "likely" semantic, it is not the same than "probable"). Technically this is performed by looking for the parameters that minimize the chi2(pars) function (or maximizes the likelihood). "pars" is a vector that includes both cosmological and all the nuisance parameters. Here it has low dimensionality but it is frequent to go up to 10, 30 or 100 parameters. So we are dealing with the minimization of a multidimensional function. There are several algorithms to do that but there is no magical one (unless your likelihood is linear which is rarely the case). Furthermore we don't have access here to the analytical gradients of the function which complicates the work. We use in CAMEL the MINUIT package which has been widely used in High Energy Physics. However remember: no magic, you always have to experiment with such tools to ensure your really captured the minimum chi2 solution. This means trying different starting points and check that the solution is not blocked at some boundary for example. So far our parameters were fixed. Let us release one.
Modify one of your parameter adding a (rough) estimate of its error and some bounds within which it should be varied
with guesses for the "error" and "bounds" (use spaces or tabs to separate the fields) Also to see what happens during the minimization add to your parFile verbose=true $EXECDIR/Minimize lcdm.par bf.txt hess.txt (the two output arguments are the bestfit/hessian outputs written in external files) What is shown is then the values that are being tried by MINUIT: the last 2 columns are of interest also: they gives respectively the chi2 value for this set of parameters and the CLASS time computation of the model (in s). Don't be worried at the end by this kind of message: WARNING: FunctionMinimum is invalid. which is hard to avoid because of the numerical noise that was discussed. The minimum can be actually valid (but maybe not the estimate of its error, see the Hessian part). Modify slightly the input value and rerun. Do you obtain the same minimum? Which one is better? It may be strange that the minimizer does not always find the same minimum depending on the starting values. In fact it is a challenging problem to go the minimum very precisely (we will see later that we want the chi2_min value below 1 in precision). First because we do not have the gradients to help the descent. But essentially because we are using a numerical code (CLASS) to compute the observables (here Pk). Those computations have soem accuracy ( cutoffs, binnings etc.) which leads to some numerical noise: the derivatives with respect to the cosmological parameters (ie. d_Pk/d_omega_b) do not vary very smoothly (as with an anlytical function) and any minimizer does hate that. We have somewhat tuned the minimization stategy with MINUIT to survive to these effects but still sometimes the minimizer finds a better route to the very precise minimum. In real life estimation we often shoot a set of randomly shaked starting values (within the bounds of the parFile): the receipe is then simple: always take the solution with the lowest chi2_min value. (there are some scripts in camel to automatize this process). 4. PrecisionRecall that CLASS is solving numerically some complicated equations. One can control the level of accuracy of many steps in the code. What is nice in CLASS is that all the precision parameters (cutoffs,steps etc.) can be specified in a single place. CLASS has some reasonable values but you may need to adapt them if your need some really high precision. Of course this will take more time so it is always a compromise between speed and precision. When we try to minimize a multidimensional function the algorithms assume that the function is smooth wrt the parameters. It is not the case here since within CLASS the "error" and "bounds" in finite precision of the computations brings up numerical noise because chi2(pars)=chi2(Pk(pars)) where the Pk operation is performed by CLASS. This complexify the minimizer's task of precisely finding the minimum. CAMEL provides some higherprecision parameters to CLASS, still within reasonable computation time. Add the following line to your parFile precisionFile=class_pre/hpjul2.pre (you may look at what is in this file, recall relative paths are wrt to the /lik directory) Rerun and see if there is some difference. Now release all the parameters (ie. modify the "fix" lines by "par" ones) and look for the global bestfit. You may first keep the nuisance fixed (to 1) then release it. You may try to figure out the MINUIT strategy by looking at the output. Does the number of calls before convergence scale as the dimensionality of the minimum? 5. The Hessian matrixYou notice as you run that MINUIT finally gives you some "Values" and "Errors" for the released parameters. While the Values are really those that produced the final Chi2 value, the "errors" are much more uncertain. They (should) represent the curvature of the function at the minimum (the so called "Hessian"). However
However we will see in the 2d session, that they can still be interesting, so once you are happy with your "bestfit" keep the "hess.txt" file in a safe place (or rename it). Let's read it in python ( DISCUSSION POINT:
For convenience here is a python function that will nicely plot the correlation array: from pylab import * def plotcor(cor,par): covmat = np.tril(cor, k=1) covmat[covmat==0] = NaN imshow(covmat,vmin=1,vmax=1,interpolation='none') colorbar() yticks(range(len(cor)),par) xticks( range(len(covmat)),par) for (j,i),label in ndenumerate(covmat): if i<j : text(i,j,'{:2.3f}'.format(label),ha='center',va='center') show() why are there blanks? If you want to know more about the original discussions on the meaning of "errors" in MINUIT see James(1980) 6. Are you frequentist or Bayesian?The MLE gives you a solution in a multidimensional space which is not very convenient to visualize. What can we say about individual parameters? If the Hessian is not reliable what should we do? Up to now, we were only dealing with classical statistics, but now you have to choose your religion and decide what a probability means! It's time for you to answer this very disturbing question: '''are you a Bayesian or a frequentist?
Certainly the most sane approach for a scientist is to be agnostic and compare both approaches (which, when the problem is well constrained give in practice similar results). 7. ProfilelikelihoodThis method is the frequentist answer to the interval problem. Rather than writing a lot of theory, let see in practice how it works. We choose one variable, here the bias and we will profile it:
NB: here "a bit" means that you are looking only to chi2 values that fluctuates within a few units (let's say ~5) wrt to the lowest chi2 value
Think about it: where is the minimum located wrt to the globalbest fit (ie. when all the parameters free)? Yes: finding the exact minimum much below an error of 1 is difficult because of CLASS numerical (unavoidable) instabilities. Sometimes starting from a slightly different point leads to a slighty better chi2 and you should use this one. Here it may be a bit cumbersome because you are doing it by hand. In real life, there are some tools to send several minmizations (with differemt starting point values) to a cluster and take the best chi2, but it is outside the scope of this school. You may look at a more reallife case here (see Fig.4). Finally write your statement about the bias that was present in those data , using the "+" notation. If you have time, try the same with H0. 