CMS Data Analysis School Pre-Exercises - Sixth Set

Introduction

The color scheme of the Exercise is as follows:

GRAY: the commands to execute  (cut&paste)

GREEN: the output sample of the executed commands

PINK: code snippets

Explicit exercises and questions addressed to the student will be written in RED

As a prerequisite for this exercise, please make sure that you have correctly followed the instructions for obtaining a GitHub account in the first exercise: SWGuideCMSDataAnalysisSchoolPreExerciseFirstSet#Obtain_a_github_account

It is also helpful to have already completed SWGuideCMSDataAnalysisSchoolPreExerciseFifthSet#Collaboration_on_GitHub .

Post the answers in the online response form available from the course web area:

If problems are encountered please e-mail the

Goal of the exercise

This exercise is intended to provide you with basic familiarity with pyROOT provides bindings for all classes within the ROOT libraries and allows for replacing the usual C++ with the often less cumbersome python. The goal is to obtain a general understanding of the syntax required to import and make use of the ROOT libraries within a basic python script. Various examples are provided in order to demonstrate TH1 histogram manipulation including; reading from a .root file, creating, binning, re-binning, scaling, plotting and fitting to a Gaussian.

Whether you use python or C++ to complete your analysis is a personal preference however with the current lack of documentation on pyROOT many students stick with C++ in order to ensure their access to coding examples and experts. It is our hope that through providing you with this basic introduction and Github repository of example scripts, which you are encouraged to add to, that we can bring together the existing pyROOT community within CMS and foster its growth.

Exercise 21 - Introduction to pyROOT

Clone the pyROOTforCMSDAS Repo

Login and setup your working area on cmslpc-sl6.fnal.gov for CMSDAS@LPC_2017 or cmspos01.recas.ba.infn.it for CMSPOS@Bari_2017:

kinit USERNAME@FNAL.GOV

# Enter the password and then do:

ssh -Y USERNAME@cmslpc-sl6.fnal.gov

cd nobackup
mkdir pyROOTExamples
cd pyROOTExamples

Depending on whether you use bash or tcshell:

# for bash:
source /cvmfs/cms.cern.ch/cmsset_default.sh

or

# for tcshell:
source /cvmfs/cms.cern.ch/cmsset_default.csh

Now you can set up your CMSSW environment:

cmsrel CMSSW_9_3_2
cd CMSSW_9_3_2/src/
cmsenv

If you are up to the challenge, then before cloning the pyROOTforCMSDAS repository, command below, you can create your own fork so that you can become a founding contributor to this pyROOT documentation effort. In this case, skip the below command and move on the the alternative.

git clone https://github.com/UBParker/pyROOTforCMSDAS.git


Alternatively, if and only if, you have forked the repository:

If you have chosen to do this then simply replace "YOURGITHUBUSERNAMEr" with your actual GitHub username in the command below.

git clone https://github.com/YOURGITHUBUSERNAME/pyROOTforCMSDAS.git

After this, you can check the contents of the test directory :

cd pyROOTforCMSDAS/test
ls -l

Example 1 : hist1.py

We begin with a very short example illustrating an efficient way to import the ROOT libraries, read a histogram from a .root file, then plot it on a canvas.


First, lets have a look at the file:

less hist1.py

You should see the script below:

import ROOT

theInfile=ROOT.TFile("../samples/infile.root","READ")

theOutfile=ROOT.TFile("outfile.root","RECREATE")

theHist = theInfile.Get("AK8MHist16")

ROOT.gStyle.SetOptFit(1111)
theCanvas = ROOT.TCanvas('theCanvas','')
theHist.Draw('hist')
theCanvas.Update()
theCanvas.Draw()

theHist.Write()
theCanvas.Write()

theOutfile.cd()
theOutfile.Write()
theOutfile.Close()

Notice the first line reads:

import ROOT

* NOTE - One could also import all ROOT libraries, which would change the syntax a bit:

from ROOT import *

theOutfile = TFile("outfile.root","RECREATE")

However, this method is not advised as it is much slower to import all libraries with the wildcard.*

After importing ROOT the script defines the input and output files then extracts an existing TH1F stored in ../samples/infile.py :

theHist = theInfile.Get("AK8MHist12")

Next a TCanvas is initialized and the histogram is drawn on it:

theCanvas = ROOT.TCanvas('theCanvas','')
theHist.Draw('hist')

Our script then writes the canvas to the file:

theOutfile.cd()
theOutfile.Write()
theOutfile.Close()

This script is almost useless as you could have achieved the same result by using an interactive python session as described in example 1 of this pyROOT Tutorial :

python -i

Python 2.7.11 (default, Apr 28 2017, 13:50:27)
[GCC 6.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> import ROOT
>>> tb = ROOT.TBrowser()
>>> 

From the TBrowser you could navigate to the histogram which we plotted. To exit interactive python, press

ctrl+D
or type
.q
.

Everything shown above could easily be achieved interactively but this simple example serves as the foundation for our learning of slightly more complex histogram manipulation.


So let's run the script to produce the same histogram that we just viewed from the TBrowser:

python -i hist1.py

If you do not want to keep the graphical display of the histogram open when you run the script, and only run the script (to produce the output file) while suppressing the graphics:

python hist1.py -b

Example 2 : hist2.py

We proceed with another short example which builds on the first but also illustrates some commonly used histogram manipulations such as; creating, binning, re-binning, scaling, fitting to a Gaussian and saving to various file types.


First, lets have a look at the file:

less hist2.py

You should see the script below:

import ROOT
import array as a

theOutfile=ROOT.TFile("outfile.root","RECREATE")


# Bins are defined by an array 
ptBs =  a.array('d', [200, 300 , 400 , 500 , 800 ])
nptBs = len(ptBs) - 1

# Define a binned histogram for storing the peak of the gaussian fit to a Top mass spectrum in each Pt bin

hpeak = ROOT.TH1F("hpeak", " ; p_{t} of jet (GeV); mean of Gaussian fit",  nptBs, ptBs)  
  


# Open the input file

theInfile = ROOT.TFile("../samples/infile.root","READ")

# Choose which stage of the selection you would like to see (0-17)
theSelectionStage = 15

# Get the histograms of AK8 jet mass binned by Pt of the jet

hist_list = [ "AK8MPt200To300Hist" + str(theSelectionStage),    
              "AK8MPt300To400Hist" + str(theSelectionStage),    
              "AK8MPt400To500Hist" + str(theSelectionStage),    
              "AK8MPt500To800Hist" + str(theSelectionStage),    
              ]
              
# loop over histos and for each plot it and fit to a gaussian then save the mean              
              
for ihisto , histo in enumerate(hist_list) :
    theHist = theInfile.Get(histo )
    print "Now working with histogram named {}".format( histo )
    # Ensure proper error propogation
    theHist.Sumw2()
    theHist.SetDirectory(0)
    
    scaleis = 100.
    # If you want to scale the hist uncomment the line below
    #theHist.Scale(scaleis)
    
    # Change the number of bins from the value the histogram was initialized with
    theHist.Rebin(10)

    # Fitting the histogram to a Gaussian
    theFitFunc = ROOT.TF1("theFitFunc", "gaus", 110, 210 )
    theFitFunc.SetLineColor(1)
    theFitFunc.SetLineWidth(2)
    theFitFunc.SetLineStyle(2)
    theHist.Fit(theFitFunc,'R' )

    # Getting ampltitude, mean and width of fit
    amp_data    = theFitFunc.GetParameter(0);
    eamp_data   = theFitFunc.GetParError(0); 
    mean_data   = theFitFunc.GetParameter(1);
    emean_data  = theFitFunc.GetParError(1); 
    width_data  = theFitFunc.GetParameter(2);
    ewidth_data = theFitFunc.GetParError(2); 

    # Printing fit information
    print 'amp {0:3.2f} #pm eamp {1:3.2f}'.format(amp_data, eamp_data    ) 
    print 'mean_data   '+str(mean_data   ) + 'emean_data  '+str(emean_data  ) 
    print 'width_data {0:3.2f} #pm {1:1.2f} '.format(width_data , ewidth_data ) 
    
   # Filling histogram with peak value
    massToFill = [ 250, 350, 450, 650 ]
    ibin = hscale.GetXaxis().FindBin(massToFill[ihisto])
    hpeak.SetBinContent(ibin, mean_data )
    hpeak.SetBinError(ibin, emean_data)


    ROOT.gStyle.SetOptFit(1111)
    c = ROOT.TCanvas('c','')
    
    theHist.Draw('hist')
    theFitFunc.Draw("same")
    
    # Set the maximum value for the y axis
    maxy= theHist.GetMaximum() 
    theHist.SetMaximum( maxy * 1.2)
    theHist.GetXaxis().SetRangeUser(80 , 250 )

    c.Update()
    c.Draw()

    c.Print('plots/'+ str(histo) + '_' + 'GaussianFit.png', 'png' )
    c.Print('plots/'+ str(histo) + '_' + 'GaussianFit.pdf', 'pdf' )
    c.Print('plots/'+ str(histo) + '_' + 'GaussianFit.root', 'root' )
   
c1 = ROOT.TCanvas('c1','')

thepeak.Draw('p')

# Set the maximum value for the y axis
maxy= theHist.GetMaximum() 
theHist.SetMaximum( maxy * 1.2)
theHist.GetXaxis().SetRangeUser(0 , 800 )

c1.Update()
c1.Draw()

c1.Print('plots/AK8MPtBinned' + '_' + 'GaussianFitMeanValues.png', 'png' )



theOutfile.cd()
theOutfile.Write()
theOutfile.Close()


This script illustrates how to:

1.) Create a histogram which is binned using an array to define the bin sizes:

import array as a

# Bins are defined by an array 
ptBs =  a.array('d', [200, 300 , 400 , 500 , 800 ])
nptBs = len(ptBs) - 1

# Define a binned histogram for storing the peak of the gaussian fit to a Top mass spectrum in each Pt bin
hpeak = ROOT.TH1F("hpeak", " ; p_{t} of jet (GeV); Peak Location ",  nptBs, ptBs)  

2.) Deal with histograms created in a loop.

See [https://github.com/UBParker/pyROOTforCMSDAS/blob/master/test/hist2.py#L36[][line 36 of hist2.py]] where the enumerate() function is used to loop over the histogram list.

3.) Scale and Re-Bin an existing histogram.

    scaleis = 100.
    # If you want to scale the hist uncomment the line below
    #theHist.Scale(scaleis)
    
    # Change the number of bins from the value the histogram was initialized with
    theHist.Rebin(10)

4.) Fit to a Gaussian within a specified range (Here the x range is 110-210 GeV).

    # Fitting the histogram to a Gaussian
    theFitFunc = ROOT.TF1("theFitFunc", "gaus", 110, 210 )
    theFitFunc.SetLineColor(1)
    theFitFunc.SetLineWidth(2)
    theFitFunc.SetLineStyle(2)
    theHist.Fit(theFitFunc,'R' )

    # Getting ampltitude, mean and width of fit
    amp_data    = theFitFunc.GetParameter(0);
    eamp_data   = theFitFunc.GetParError(0); 
    mean_data   = theFitFunc.GetParameter(1);
    emean_data  = theFitFunc.GetParError(1); 
    width_data  = theFitFunc.GetParameter(2);
    ewidth_data = theFitFunc.GetParError(2); 

5) Fill a histogram using SetBinContent

   # Filling histogram with peak value
    massToFill = [ 250, 350, 450, 650 ]
    ibin = hscale.GetXaxis().FindBin(massToFill[ihisto])
    hpeak.SetBinContent(ibin, mean_data )
    hpeak.SetBinError(ibin, emean_data)

Now if we run the script we see the following output with information about the fit for each histogram.

python -i hist2.py
Now working with histogram named AK8MPt200To300Hist16
TH1F::Sumw2:0: RuntimeWarning: Sum of squares of weights structure already created
Info in :  created default TCanvas with name c1
 FCN=7.66425 FROM MIGRAD    STATUS=CONVERGED      77 CALLS          78 TOTAL
                     EDM=2.22658e-10    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  Constant     7.56904e+00   1.35112e+00   1.42150e-03  -3.67670e-06
   2  Mean         1.58353e+02   1.96842e+00   2.39434e-03  -9.26384e-06
   3  Sigma        1.28363e+01   1.55147e+00   3.39567e-05  -7.59393e-04
amp 7.56903737859 #pm eamp 1.35111681853
mean_data   158.353074577emean_data  1.96842065846
width_data 12.84 #pm 1.55 

For CMSDAS@LPC2018 please submit your answer at the Google Form sixth set.

Question 21.1: What is the mean value of the Gaussian fit of the jet mass spectrum for jets of pt 300-400 GeV ?

Hopefully this extremely brief introduction has peaked your interest in pyROOT and encouraged you to learn more about this versatile tool.

Advanced topics

Advanced topics not explored in this exercise but to be included on the pyROOT for CMSDAS GitHub page in the near future are : reading and writing a TTree, using a python analyzer to skim a TTree and creating plots in the CMS PubCom format.

Students are encouraged to explore these and other topics on their own and to assist with the CMS effort to document pyROOT by creating your own fork of pyROOTforCMSDAS and adding to the example scripts available there.

Link to SWGuideCMSDataAnalysisSchoolPreExerciseFirstSet

Link to SWGuideCMSDataAnalysisSchoolPreExerciseSecondSet

Link to SWGuideCMSDataAnalysisSchoolPreExerciseThirdSet

Link to SWGuideCMSDataAnalysisSchoolPreExerciseFourthSet

Link to SWGuideCMSDataAnalysisSchoolPreExerciseFifthSet

-- AshleyMarieParker - 2016-12-07

-- Sudhir Malik - 2017-12-15

Comments

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2017-12-15 - SudhirMalik
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback