Installation
Using CAMEL
Tutorials
|
AnalysingTheOutputsOn this page... (hide) The general philosophy of CAMEL is that it is not a black-box: it is an opportunity to learn (and control) each step of cosmological parmeter estimation. All the outputs are available as plain ascii files and we do not provide low-level tools to read and display them. Some common utlities are however provided as script files and a python library ( 1. Minimization1.1 Extracting the interesting part from a best-fit log fileOnce you ran some batch with the awk -f /path/to/work/tools/awk/bestfit.awk logfile or, within python, use the Recall that the final best-fit values are available in the text file provided in the 1.2 Finding the lowest chi2 from several best-fit filesIf you have several outputs in the same directory (what happens if several starting points are tried for instance with the multimin.sh batch submission scripts) you may determine the best of the best-fits (lowest chi2) with awk -f /path/to/work/tools/awk/zebest.awk *.o* (assuming your pattern for finding all teh log-files is of the form *.o*) 2. Profile LikelihoodThe profile likelihood curve of a given parameter (θ) is the concatenation of "best fits" made for different fixed values of this very θ. Analysing a profile likelihood is therefore rather easy. A python utility software is available within CAMEL (in work/tools/python/draw_profile.py) which allows to do so. 2.1 inline plot toolsYou can run this tool on the output of your Profile or Minimize runs (cf. here ) which should look like Attach:test.txt (you can concatenate the results of Minimize with different fits for a given value of the profiled variable, the python code will take care of them to find the minimum at each point) Then you simply have to do: python tools/python/draw_profile.py test.txt This will give you the results of the profile using an spline interpolation to compute the shape of the χ2 at its minimum. If you want to use a poly3d fit (which can be useful to test the robustness of the result) you can activate the --fit option: python tools/python/draw_profile.py test.txt --fit If you want the plot(s) to be saved you can use the --save option: python tools/python/draw_profile.py test.txt --save Finally if in your input file you have several chi2 value for a given value of the profiled variables (using Minimimize.sh for instance,you can use the "order" option to find the min chi2 at each point: python draw_profile.py allbestfits.txt --order 2.2 python libraryIf you want to go further in the profile analysis, we provide a python class in (work/tools/python/camel.py). The code uses the written outputs from the Profile.sh script which creates one directory for each profiled-value containing several bestfits. Then:
#init dbdir = "/sps/planck/Users/tristram/Camel/Alens/hlpTT_lol_prof_Alens" parname = "Alens" prof = camel.prof( dbdir, parname) #return and plot profile points listval,chi2 = prof.get() plot( listval, chi2-min(chi2), marker='o') #fit profile prof.fit(method=‘polyfit',order=3,deltaMax=3) prof.plot() #exemple of multiple profiles dbdir = "/sps/planck/Users/tristram/Camel/Mnu/" parname = "Mnu" prof1 = camel.prof( dbdir+"hlpTT_lol_nuscan", parname) prof2 = camel.prof( dbdir+"hlpTT_lol_Alens_nuscan", parname) prof1.fit(method=‘polyfit',order=2,upper=True) prof2.fit(method=‘polyfit',order=2,upper=True) prof1.plot(result=False, color=‘r’, label="hlpTT+lol") prof2.plot(result=False, color=‘b’, label="hlpTT+lol (Alens)") legend() 3. MCMCThe MCMC analysis can give access to different information once you have checked that the Markov Chains have converged: get the mean/errors of the parameters, and get their correlation matrix. A python library is provided within CAMEL (in 3.1 PythonAn example of how to read the output of one MCMC and get the results: mean and errors of each parameter: import numpy as np import matplotlib.pyplot as plt import camel import math dbdir = # define here the directory in which you have saved the output of your MCMC in runname_MC/ runname = 'hlpTT_bflike' # then define the legend legtxt = ['Hillipop+lowTEB'] # read the parameter names par = camel.Parnames( dbdir+runname+'.par') # read the chains # read 4 chains and merge the last 10% of each chain (90% burnin) chain = camel.mergeMC( '%s/%s_MC/samples' % (dbdir,runname), num=(1,2,3,4), burnin=0.9) # compute the mean and errors and write them on the screen for parname in par: print "%15s " % parname, "\t %f \t %f \t %f" % camel.getci( np.percentile( chain[parname], [16,50,84])) An example to plot posterior distributions for one or two parameters # plot 1-D posterior parname = #name of the parameter camel.hist1d( chain[parname], bins=50, color='b', lw=2, smooth=1.) # plot 2-D posterior parname1 = #name of the parameter 1 parname2 = #name of the parameter 2 camel.hist2d( chain, parname1, parname2, bins=50, sigmas=[1,2], cmap="Blues", lw=2, smooth=1., extent=[[0,1],[0,10]]) An example to get the correlation matrix of the parameters: #plot correlation matrix listpar = #subset of parameter names camel.plot_covariance( chain, listpar) #extract covariance matrix data = [chain[p] for p in listpar] covMC = np.cov(data) An example of how to use the camel.py library when you want to compare the output of two MCMC runs (the so-called Triangle Plot). dbdir = # define here the directory in which you have saved the output of your MCMC in runname1_MCMC/ and runname2_MCMC/ figdir = # define here the directory in which you want your plots to be saved runname1 = 'hlpTT_bflike' runname2 = 'hlpTT_bflike_vhl' # then define the legend legtxt = ['Hillipop+lowTEB','Hillipop+lowTEB+VHL'] # read the chains chain1 = camel.mergeMC( '%s/%s_MC/samples' % (dbdir,runname1), num=(1,2,3,4), burnin=0.9) chain2 = camel.mergeMC( '%s/%s_MC/samples' % (dbdir,runname2), num=(1,2,3,4), burnin=0.9) # plot triangle listpar = #subset of parameter that belongs to the two chains camel.triangle( (chain1,chain2), params=listpar, name=legtxt, color=('k','g')) plt.savefig( figdir+'CompareMCMC.pdf', bbox_inches='tight') Convergence tests: #extract full chains runname = 'hlpTT_bflike' chains = camel.extract_chains( '%s/%s_MC/samples' % (dbdir,runname), 4) #plot variable evolution for each chain ivar = 0 #index of variable: first is the chi2 for c in data: plt.plot( c[ivar]) #Compute Gelman-Rubin for each parameter it,R = camel.GelmanRubin( chains, gap=10000, length_max=200000) plt.figure() plt.plot(it,R-1.) plt.axhline( 0.03, color='k', ls='--') plt.semilogy() #Compute Geweke Diagnostic for each parameter for each chain it,T=camel.Geweke( chains, gap=10000, length_max=200000) plt.figure() r = transpose(T,axes=(1,0,2)) for i in range(len(r)): plt.subplot( 4, 1, i+1) plt.plot(it,r[i]) plt.ylim(-100,100) 3.2 GetDistYou can interface the camel outputs to standard GetDist function. The following example shows how to perform a triangle plot (others examples at https://getdist.readthedocs.io/en/latest/plot_gallery.html, including 1D/2D/3D plots, convergence statistics, etc.) import camel from getdist import plots, MCSamples dbdir="/home/plaszczy/agnostic_cosmo/data/" datadir=["hlpTT_bflike_LCDM_MC"] allpars=camel.MCparnames(dbdir+datadir[0]+"/samples") params_to_plot=allpars[1:7] print(params_to_plot) #strip $ from camel label names labels=[camel.parname.setdefault(n,n).replace('$','') for n in params_to_plot] #strip * from param names (as in 100*theta_s) names=[n.replace('*','') for n in params_to_plot] # read chains with camel optimized functions chain = camel.mergeMC( dbdir+datadir[0]+'/samples', num=(1,2,3,4), burnin=0.70) #keep samples to plot mcmc = [chain[par] for par in params_to_plot] ###### Analysis with Getdist ###### ### Create an "MCsamples" object from the CAMEL chains ### To compare several runs, create as many "MCsamples" samples1 = MCSamples( samples=mcmc, names=names,labels=labels) #samples2 = MCSamples(...... samples = [samples1] # or [samples1,samples2,...] ### Triangle plot g = plots.getPlotter() # creates the "plotting" object g.settings.setWithSubplotSize(5.0000) # controls the size of each subplot # All the follwing settings are optional legends = ['Run 1'] # label for each run colors = ['red'] # color for each run lines = ['-'] # linestyle for each run is_filled = True # controls if 2D contours are filled or not lab_ord = [0] # ordering of the label loc_leg = 'upper right' # placement of the legend # Do the triangle plot g.triangle_plot(samples,params_to_plot,legend_labels=legends,contour_colors=colors,contour_ls=lines,filled=is_filled,label_order=lab_ord,legend_loc=loc_leg) # Save it into a file g.export('triangle_test.pdf') (thanks to S. Ilic) 3.3 IDLWe provide a few IDL routines to analyze your chains output. They make use of sone Since reading txt files is slow, we provide the following procedure to convert your chains into a single FITS file:
suppose you ran 4 chains and the outputs are in the directory file=dir+'/samples.fits' conv2fits,dir+"/samples*.txt", fitsout=file then converts the files into a single fitsout FITS file (this will show you also trace plots). Note that the data starts only at HDU 1 and the name of variables is stored in its HDR (with some extra stuff ,see how to decode below) You may for instance re-plot each variable of a chain with the following code snippet: t= mrdfits(file, 1, hdr) ; look at 1st chain if (n_elements(name) eq 0) then name=strtrim(hdr[5:n_elements(hdr)-4],2) !p.multi=[0,5,5] ; adapt this to your number of variables for j=0,sz[1]-1 do plot,t[*,j],xtit=name[j] The .r trace_plots trace_plots, file Then you may look at the Gelman_rubin test as in the following example: .r gelman_rubin ;read all chains R=gelman_rubin(file,it=it,names=names) ;"it" represents the x axis, names the name of all the variables nvar=(size(R))[2] cols=long(range(0,250,N=nvar)) ; different colours plot,it,R[*,0]-1,/ylog,ytit="R-1" for i=0,nvar-1 do oplot,it,R[*,i]-1,col=cols[i] al_legend,names,textcol=cols,/right,col=cols,charsize=0.7,box=0,/top ; from astron oplot,minmax(it),[0.03,0.03],line=2 oplot,minmax(it),[0.01,0.01],line=3 Now you know where to cut and concatenate your chains (you may also build another fits file for faster later use, use /write keyword) For example for a cut at 100 000: ncut=100000L .r mergeMC chain=mergeMC(file, ncut,/write) ; Then histogram each variable (with you preferred routines, everyone has his/her own!) of each One can display easily the correlation matrix of the parameters posteriors with the following snippet: ;read converged chain chain=mergeMC(file, 100000L,names=names) ;compute empirical covariance : be careful 1st index is chi2 so you should drop it cov=cov_empiric(chain[*,1:*]) ;transform to correlations cor=cov2cor(cov) ; display: the fowllwing uses the coyote lib. you may prefer to use your own way to display a 2d array plotcor,cor,names[1:*],/write,/pdf,/view |