AnalysingTheOutputs

On this page... (hide)

  1.   1.  Minimization
    1.   1.1  Extracting the interesting part from a best-fit log file
    2.   1.2  Finding the lowest chi2 from several best-fit files
  2.   2.  Profile Likelihood
  3.   3.  MCMC
    1.   3.1  Python
    2.   3.2  GetDist
    3.   3.3  IDL

The general philosophy of CAMEL is that it is not a black-box: it is an opportunity to learn (and control) each step of cosmological parmeter estimation. All the outputs are available as plain ascii files and we do not provide low-level tools to read and display them.

Some common utlities are however provided as script files and a python library ( work/tools/python/camel.py ) to ease MCMC analysis.

1.  Minimization

1.1  Extracting the interesting part from a best-fit log file

Once you ran some batch with the Minimize executable you recover some output log-file. You are encouraged to have a look at it (don'worry about messages as "fit did not converge" this is classical with CLASS) If you want to have some summary of the results, you can run:

awk -f /path/to/work/tools/awk/bestfit.awk logfile 

or, within python, use the scan function from camel.py

Recall that the final best-fit values are available in the text file provided in the Minimize call. If you further specified a name to it you also have access to a standalone written Hessian matrix.

1.2  Finding the lowest chi2 from several best-fit files

If you have several outputs in the same directory (what happens if several starting points are tried for instance with the multimin.sh batch submission scripts) you may determine the best of the best-fits (lowest chi2) with

awk -f /path/to/work/tools/awk/zebest.awk *.o*

(assuming your pattern for finding all teh log-files is of the form *.o*)

2.  Profile Likelihood

The profile likelihood curve of a given parameter (θ) is the concatenation of "best fits" made for different fixed values of this very θ. Analysing a profile likelihood is therefore rather easy. A python utility software is available within CAMEL (in work/tools/python/draw_profile.py) which allows to do so.

You can run this tool on the output of your Profile or Minimize runs (cf. here ) which should look like Attach:test.txt (you can concatenate the results of Minimize with different fits for a given value of the profiled variable, the python code will take care of them to find the minimum at each point)

Then you simply have to do:

python tools/python/draw_profile.py test.txt 

This will give you the results of the profile using an spline interpolation to compute the shape of the χ2 at its minimum.

If you want to use a poly3d fit (which can be useful to test the robustness of the result) you can activate the --fit option:

python tools/python/draw_profile.py test.txt  --fit 

If you want the plot(s) to be saved you can use the --save option:

python tools/python/draw_profile.py test.txt  --save

Finally if in your input file you have several chi2 value for a given value of the profiled variables (using Minimimize.sh for instance,you can use the "order" option to find the min chi2 at each point:

python draw_profile.py allbestfits.txt  --order

If you want to go further in the profile analysis, we provide a python class in (work/tools/python/camel.py). The code uses the written outputs from the Profile.sh script which creates one directory for each profiled-value containing several bestfits. Then:

  1. creation is made using the path of the head directory and the name of the variable. It will look for all the profiled-value and return the minimum χ2 found in each case.
  2. fit the profile using different modes (“cubic”:1D-interpolation, “spline”, “polyfit”:with defined order, default=3). Δχ2 max used for the fit can be defined through the parameter deltaMax.
    The fit returns (xmin, (xlow,xhigh)) but can also deal with upper-limits using the option upper=True and return CL=95% (with FeldmanCousin regularisation).
  3. plot the profile and its fit (the last one)
  4. return any other bestfit value than the χ2 to be plotted as a function of the profiled-variable
#init
dbdir = "/sps/planck/Users/tristram/Camel/Alens/hlpTT_lol_prof_Alens"
parname = "Alens"
prof = camel.prof( dbdir, parname)

#return and plot profile points
listval,chi2 = prof.get()
plot( listval, chi2-min(chi2), marker='o')

#fit profile
prof.fit(method=‘polyfit',order=3,deltaMax=3)
prof.plot()

#exemple of multiple profiles
dbdir = "/sps/planck/Users/tristram/Camel/Mnu/"
parname = "Mnu"
prof1 = camel.prof( dbdir+"hlpTT_lol_nuscan",           parname)
prof2 = camel.prof( dbdir+"hlpTT_lol_Alens_nuscan",     parname)

prof1.fit(method=‘polyfit',order=2,upper=True)
prof2.fit(method=‘polyfit',order=2,upper=True)
prof1.plot(result=False, color=‘r’, label="hlpTT+lol")
prof2.plot(result=False, color=‘b’, label="hlpTT+lol (Alens)")
legend()

3.  MCMC

The MCMC analysis can give access to different information once you have checked that the Markov Chains have converged: get the mean/errors of the parameters, and get their correlation matrix. A python library is provided within CAMEL (in work/tools/python/camel.py) which allows to do so. Some idl procedures are also available in work/tools/idl

3.1  Python

An example of how to read the output of one MCMC and get the results: mean and errors of each parameter:

import numpy as np
import matplotlib.pyplot as plt
import camel
import math


dbdir  = # define here the directory in which you have saved the output of your MCMC in runname_MC/

runname = 'hlpTT_bflike'

# then define the legend 
legtxt = ['Hillipop+lowTEB']

# read the parameter names
par = camel.Parnames( dbdir+runname+'.par')

# read the chains 
# read 4 chains and merge the last 10% of each chain (90% burnin)
chain = camel.mergeMC( '%s/%s_MC/samples' % (dbdir,runname), num=(1,2,3,4), burnin=0.9)

# compute the mean and errors and write them on the screen
for parname in par:
    print "%15s " % parname, "\t %f \t %f \t %f" % camel.getci( np.percentile(  chain[parname], [16,50,84]))

An example to plot posterior distributions for one or two parameters

# plot 1-D posterior
parname = #name of the parameter
camel.hist1d( chain[parname], bins=50, color='b', lw=2, smooth=1.)

# plot 2-D posterior
parname1 = #name of the parameter 1
parname2 = #name of the parameter 2
camel.hist2d( chain, parname1, parname2, bins=50, sigmas=[1,2], cmap="Blues", lw=2, smooth=1., extent=[[0,1],[0,10]])

An example to get the correlation matrix of the parameters:

#plot correlation matrix
listpar = #subset of parameter names
camel.plot_covariance( chain, listpar)

#extract covariance matrix
data = [chain[p] for p in listpar]
covMC = np.cov(data)

An example of how to use the camel.py library when you want to compare the output of two MCMC runs (the so-called Triangle Plot).

dbdir  = # define here the directory in which you have saved the output of your MCMC  in runname1_MCMC/ and runname2_MCMC/
figdir = # define here the directory in which you want your plots to be saved

runname1 = 'hlpTT_bflike'
runname2 = 'hlpTT_bflike_vhl'

# then define the legend 
legtxt = ['Hillipop+lowTEB','Hillipop+lowTEB+VHL']

# read the chains 
chain1 = camel.mergeMC( '%s/%s_MC/samples' % (dbdir,runname1), num=(1,2,3,4), burnin=0.9)
chain2 = camel.mergeMC( '%s/%s_MC/samples' % (dbdir,runname2), num=(1,2,3,4), burnin=0.9)

# plot triangle
listpar = #subset of parameter that belongs to the two chains
camel.triangle( (chain1,chain2), params=listpar, name=legtxt, color=('k','g'))
plt.savefig( figdir+'CompareMCMC.pdf', bbox_inches='tight')

Convergence tests:

#extract full chains
runname = 'hlpTT_bflike'
chains = camel.extract_chains( '%s/%s_MC/samples' % (dbdir,runname), 4)

#plot variable evolution for each chain
ivar = 0 #index of variable: first is the chi2
for c in data:
    plt.plot( c[ivar])

#Compute Gelman-Rubin for each parameter
it,R = camel.GelmanRubin( chains, gap=10000, length_max=200000)
plt.figure()
plt.plot(it,R-1.)
plt.axhline( 0.03, color='k', ls='--')
plt.semilogy()

#Compute Geweke Diagnostic for each parameter for each chain
it,T=camel.Geweke( chains, gap=10000, length_max=200000)
plt.figure()
r = transpose(T,axes=(1,0,2))
for i in range(len(r)):
    plt.subplot( 4, 1, i+1) 
    plt.plot(it,r[i])
    plt.ylim(-100,100)

3.2  GetDist

You can interface the camel outputs to standard GetDist function. The following example shows how to perform a triangle plot (others examples at https://getdist.readthedocs.io/en/latest/plot_gallery.html, including 1D/2D/3D plots, convergence statistics, etc.)

import camel
from getdist import plots, MCSamples

dbdir="/home/plaszczy/agnostic_cosmo/data/"
datadir=["hlpTT_bflike_LCDM_MC"]

allpars=camel.MCparnames(dbdir+datadir[0]+"/samples")
params_to_plot=allpars[1:7]
print(params_to_plot)

#strip $ from camel label names
labels=[camel.parname.setdefault(n,n).replace('$','') for n in params_to_plot]

#strip * from param names (as in 100*theta_s)
names=[n.replace('*','') for n in params_to_plot]

# read chains with camel optimized functions
chain = camel.mergeMC( dbdir+datadir[0]+'/samples', num=(1,2,3,4), burnin=0.70)

#keep samples to plot
mcmc = [chain[par] for par in params_to_plot]

###### Analysis with Getdist ######

### Create an "MCsamples" object from the CAMEL chains
### To compare several runs, create as many "MCsamples"
samples1 = MCSamples( samples=mcmc, names=names,labels=labels)
#samples2 = MCSamples(......
samples = [samples1] # or [samples1,samples2,...]

### Triangle plot
g = plots.getPlotter()                # creates the "plotting" object
g.settings.setWithSubplotSize(5.0000) # controls the size of each subplot
# All the follwing settings are optional
legends   = ['Run 1']                 # label for each run
colors    = ['red']                   # color for each run
lines     = ['-']                     # linestyle for each run
is_filled = True                      # controls if 2D contours are filled or not
lab_ord   = [0]                       # ordering of the label
loc_leg   = 'upper right'             # placement of the legend
# Do the triangle plot
g.triangle_plot(samples,params_to_plot,legend_labels=legends,contour_colors=colors,contour_ls=lines,filled=is_filled,label_order=lab_ord,legend_loc=loc_leg)
# Save it into a file
g.export('triangle_test.pdf')

(thanks to S. Ilic)

3.3  IDL

We provide a few IDL routines to analyze your chains output. They make use of sone astron features.

Since reading txt files is slow, we provide the following procedure to convert your chains into a single FITS file: suppose you ran 4 chains and the outputs are in the directory dir with names samples1.txt, samples2.txt, samples2.txt, samples3.txt

file=dir+'/samples.fits' 
conv2fits,dir+"/samples*.txt", fitsout=file

then converts the files into a single fitsout FITS file (this will show you also trace plots).

Note that the data starts only at HDU 1 and the name of variables is stored in its HDR (with some extra stuff ,see how to decode below) You may for instance re-plot each variable of a chain with the following code snippet:

t= mrdfits(file, 1, hdr)  ;  look at 1st chain
if (n_elements(name) eq 0) then name=strtrim(hdr[5:n_elements(hdr)-4],2)

!p.multi=[0,5,5]   ; adapt this to your number of variables
for j=0,sz[1]-1 do plot,t[*,j],xtit=name[j]

The trace_plots.pro procedure does essentially this for all the chains sorted in the fits file; you may use it as

 
.r trace_plots
trace_plots, file

Then you may look at the Gelman_rubin test as in the following example:

.r gelman_rubin

;read all chains
R=gelman_rubin(file,it=it,names=names)   ;"it" represents the x axis, names the name of all the variables

nvar=(size(R))[2]
cols=long(range(0,250,N=nvar))                                       ; different colours

plot,it,R[*,0]-1,/ylog,ytit="R-1"
for i=0,nvar-1 do oplot,it,R[*,i]-1,col=cols[i]
al_legend,names,textcol=cols,/right,col=cols,charsize=0.7,box=0,/top ; from astron
oplot,minmax(it),[0.03,0.03],line=2
oplot,minmax(it),[0.01,0.01],line=3

Now you know where to cut and concatenate your chains (you may also build another fits file for faster later use, use /write keyword) For example for a cut at 100 000:

ncut=100000L

.r mergeMC
chain=mergeMC(file, ncut,/write) ; 

Then histogram each variable (with you preferred routines, everyone has his/her own!) of each chain[*,ivar] (note that ivar=0 corresponds to chi2's values and you probably don't want to histogram it)

One can display easily the correlation matrix of the parameters posteriors with the following snippet:

;read converged chain
chain=mergeMC(file, 100000L,names=names)

;compute empirical covariance : be careful 1st index is chi2 so you should drop it
cov=cov_empiric(chain[*,1:*]) 

;transform to correlations
cor=cov2cor(cov)

; display: the fowllwing uses the coyote lib. you may prefer to use your own way to display a 2d array
plotcor,cor,names[1:*],/write,/pdf,/view