CMS Data Analysis School Pre-Exercises - Second Set
Introduction
Welcome to the second set of CMSDAS pre-exercises. As you know by now, the
purpose of the pre-workshop exercises is for prospective workshop attendees to
become familiar with the basic software tools required to perform physics
analysis at CMS before the workshop begins. Sets of exercises will appear at
regular intervals between now and January. Please run through and complete
these exercises. Post the answers in the online response form available from
the course web area -
For CMSDAS@LPC 2016, Fermilab
,
For CMSDAS@INFN 2015, Bari
. A large amount of additional information about these exercises is available in the twikis that we reference. Please remember that twikis evolve but aim to provide with the best information at any given time. If problems are encountered please e-mail
LPC contact CMSDASATLPC@fnal.gov
or
Bari contact leonardo.cristella@ba.infn.it
with a detailed description of your problem. The instructors will be delighted to help you.
The Second Set of exercises begins with
Exercise 7 by further reducing the size of
MiniAOD
samples in
Exercise 6 in the
First Set,
Exercises 8 and
Exercise 9 are on
FWLite
(
Frame Work Lite). This is an interactive analysis tool integrated with
CMSSW EDM (Event Data Model) Framework. It allows you to automatically load the
shared libraries defining CMSSW data formats and the tools provided, to easily
access parts of the event in the EDM format within ROOT interactive sessions.
It reads produced ROOT files, has full access to the class methods and there
is no need to write full-blown framework modules. Thus having FWLite
distribution locally on the desktop one can do CMS analysis outside the full
CMSSW framework. In these two exercises, we will analyze the data stored in a
MiniAOD
sample using FWLite. We will loop over muons and make a Z mass peak.
Exercise 10 and
Exercise 11 are on the CMS events display called
Fireworks
. Fireworks is specialized for the physics studies case. Data handling is greatly simplified by using only reconstructed information and ideal geometry. Data is presented via graphical and textual views. Fireworks provides an easy to use interface which allows a physicist to concentrate only on the data in which they are interested. Physicists can select which events (e.g. require a high energy muon), what data (e.g. which track list) and which items in a collection (e.g. only high-pt tracks) to show.
We assume that having done the First Set of exercises by now, one is comfortable with logging into lxplus6
or cmslpc-sl6.fnal.gov
or your local CMS cluster and setting up the cms environment.
Also a gentle reminder below to get CERN computer account and a Grid Certificate.
Obtain a CERN account (in case one doesn't have already)
- Use the following link for a CMS CERN account: CMS CERN account
- A CERN account is needed, for example, to login in to any e-learning web-site, or obtain a file from the afs area. A CERN account will be needed for future exercises.
- Obtaining a CERN account can be time-consuming. To expedite the process please ask the relevant institutional team leader to perform the necessary "signing" after the online form has been submitted and received for initial processing by the secretariat.
Obtain a Grid Certificate and CMS VO registration
- A Grid Certificate and CMS VO registration will be needed for the Grid Exercises. The registration process can be time-consuming (actions by several people are required), so it is important to start it as soon as possible. There are two main requirements which can be simply summarized: A certificate ensures that you are who you claim to be. A registration in the VO recognizes your (identified by your certificate) as a member of CMS. Use the following link for this: Get Your Grid Certificate and CMSVO
. Both are needed to submit jobs on the Grid. Make sure you follow any additional instructions for US-CMS users.
NOTE:
Legend of colors for this tutorial:
GRAY background for the commands to execute (cut&paste)
GREEN background for the output sample of the executed commands
BLUE background for the configuration files (cut&paste)
PINK background for the code (EDAnalyzer etc.) (cut&paste)
Exercise 7 - Slim MiniAOD sample in Exercise 6 to reduce its size by keeping only Muon and Electron branches
In order to reduce the size of the
MiniAOD
we would only keep
slimmedMuons
and
slimmedElectrons
objects and drop the rest. The config files should now look like
slimMiniAOD_MC_MuEle_cfg.py and
slimMiniAOD_data_MuEle_cfg.py.
To work with this config file and make the
slim
MiniAOD
, execute the following steps in the directory
YOURWORKINGAREA/CMSSW_7_3_0_pre1/src
1. Cut and paste the script
slimMiniAOD_MC_MuEle_cfg.py and
slimMiniAOD_data_MuEle_cfg.py
in its entirety and save it with the same name. Open with your favorite editor and take a look at these python files. The number of events have been set up to 100
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(100) )
to run over 100 events only, one can change it to -1 to run over all the events in the sample. Depending on the place you are attending the school, you have to indicate the proper location of the file according to the following:
If you are at the LPC change the first part of the location of the file from:
'file:/afs/cern.ch/user/b/benwu/public/CMSDAS/
to:
'/store/user/cmsdas/2016/PRE_EXERCISES/
respecting the names of the files on each script.
If you are using KNU
use the root files in /cmsdas/data/pre_exercises/Exe2
Make this change on both files
slimMiniAOD_MC_MuEle_cfg.py
and
slimMiniAOD_data_MuEle_cfg.py
.
2. Now run the following command:
cmsRun slimMiniAOD_MC_MuEle_cfg.py
This produces an output file called
slimMiniAOD_MC_MuEle.root
in your
YOURWORKINGAREA/CMSSW_7_3_0_pre1/src
area.
3. Now run the following command:
cmsRun slimMiniAOD_data_MuEle_cfg.py
This produces an output file called
slimMiniAOD_data_MuEle.root
in your
YOURWORKINGAREA/CMSSW_7_3_0_pre1/src
area.
On opening these two
MiniAODs one observes that only the
slimmedMuons
and
the
slimmedElectrons
objects are retained as intended. Executing
ls -altrh
gives the size of the these
MiniAODs.
To find the size of your
MiniAOD execute following Linux command:
ls -altrh slimMiniAOD_MC_MuEle.root
and
ls -altrh slimMiniAOD_data_MuEle.root
You may also try the following:
To know the size of each branch, use the
edmEventSize
utility as follows (also explained in
First Set of Exercises ):
edmEventSize -v slimMiniAOD_MC_MuEle.root
and
edmEventSize -v slimMiniAOD_data_MuEle.root
To see what objects there are, open the ROOT file as follows and browse to the
MiniAOD
samples as you did in
Exercise 6:
Here is how you do it for the output file slimMiniAOD_MC_MuEle.root
root -l slimMiniAOD_MC_MuEle.root;
TBrowser b;
OR
root -l
TFile *theFile = TFile::Open("slimMiniAOD_MC_MuEle.root");
TBrowser b;
To quit ROOT application, execute:
.q
QUESTION 7.1 - What is the size of the MiniAOD slimMiniAOD_MC_MuEle.root and slimMiniAOD_data_MuEle.root?
QUESTION 7.2 - What is the mean eta of the muons for MC and data?
QUESTION 7.3 - Has the size of the output file compared to the original sample? Is the mean eta for muons for MC and data same as the original sample in Exrcise 6?
Exercise 8 - Use FWLite on the MiniAOD
created in Exercise 8 and make a Z Peak (applying pt
and eta
cuts)
FWLite (pronounced "framework-light") is basically a ROOT session with CMS data format libraries loaded. CMS uses ROOT to persistify data objects. CMS data formats are thus "ROOT-aware"; that is, once the shared libraries containing the ROOT-friendly description of CMS data formats are loaded into a ROOT session, these objects can be accessed and used directly from within ROOT like any other ROOT class!
In addition, CMS provides a couple of classes that greatly simplify the access to the collections of CMS data objects. Moreover, these classes (Event and Handle) have the same name as analogous ones in the Full Framework; this mnemonic trick helps in making the code to access CMS collections very similar between the FWLite and the Full Framework.
In this exercise we will make a
ZPeak
using our data and MC sample. We will use the corresponding slim
MiniAOD
created in Exercises 7. To read more about FWLite, have a look at
Section 3.5
of
Chapter 3
of the
WorkBook.
We will first make a
ZPeak
. We will loop over the
slimmedMuons
in the
MiniAOD and get the mass of oppositely charged muons. These are filled in a histogram that is written to an output ROOT file.
First make sure that you have the
MiniAODs created in Exercise 7. They should be called
slimMiniAOD_MC_MuEle.root
and
slimMiniAOD_data_MuEle.root
.
1. Go to the src area of current CMSSW release
cd $CMSSW_BASE/src
The environment variable
CMSSW_BASE
will point to the base area of current CMSSW release.
1. Create a new CMSSW_7_3_0_pre1 release as you did in
Exercise 3 under YOURWORKINGAREA2
2. Check out these two packages from github.
Make sure that you get github setup properlly as in
obtain a github account. It's particularly important to set up ssh keys so that you can check out code without problems:
https://help.github.com/articles/generating-ssh-keys
Here are the instructions to do addpkg using Git:
git cms-addpkg PhysicsTools/FWLite
For any github related issues please send an email to: hn-cms-git@cern.ch
Then to compile the packages, do
scram b
Note: You can try scram b -j 8
to speed up the compiling. Here -j 8
%will compile with 8 cores.
2. To make Z peak, we would be using the FWLite executable called FWLiteHistograms
. The corresponding code should be in $CMSSW_BASE/src/PhysicsTools/FWLite/bin/FWLiteHistograms.cc
With this executable we would be using the command line options. More about these can be learned from SWGuideCommandLineParsing
To make ZPeak
from this executable, using the MC MiniAOD, run the following command:
FWLiteHistograms inputFiles=slimMiniAOD_MC_MuEle.root outputFile=ZPeak_MC.root maxEvents=-1 outputEvery=10
You can see that you will get the following error:
terminate called after throwing an instance of 'cms::Exception'
what(): ---- ProductNotFound BEGIN
getByLabel: Found zero products matching all criteria
Looking for type: edm::Wrapper<std::vector<reco::Muon> >
Looking for module label: muons
Looking for productInstanceName:
---- ProductNotFound END
Abort
This error occurs because your input files slimMiniAOD_MC_MuEle.root
is a MiniAOD and does not contain reco::Muon
whose label is muons
. It contains, however, slimmedMuons
(check yourself by opening the root file with ROOT
browser). However, in the code FWLiteHistograms.cc
there are lines that say:
using reco::Muon;
and
event.getByLabel(std::string("muons"), muons);
This means you need to switch from reco::Muon
to pat::Muon
format, muons
to slimmedMuons
label.
To implement these changes, open the code $CMSSW_BASE/src/PhysicsTools/FWLite/bin/FWLiteHistograms.cc
. In this code, look at the line that says:
using reco::Muon;
and change it to
using pat::Muon;
and in this:
event.getByLabel(std::string("muons"), muons);
and change it to
event.getByLabel(std::string("slimmedMuons"), muons);
To implement this change, you need to re-compile the code. To do this do:
scram b
cmsenv
Now again run the executable as follows:
FWLiteHistograms inputFiles=slimMiniAOD_MC_MuEle.root outputFile=ZPeak_MC.root maxEvents=-1 outputEvery=10
You can see that now it runs successfully and you get a ROOT
file with a histogram called ZPeak_MC.root
. Open this ROOT
file and see the Z mass peak histogram called mumuMass
. Answer the following question.
QUESTION 8.1 - What is mean mass of the ZPeak for your MC MiniAOD?
3. Now a little bit about the command that you executed.
In the command above, it is obvious that slimMiniAOD_MC_MuEle.root
is the input file, ZPeak_MC.root
is output file. maxEvents
is the events you want to run over. You can change it any other number. The option -1
means running over all the events which is 100 in this case. outputEvery
means after how any events should the code report the number of event being processed. As you may have noticed, as you specified, when your executable runs, it says processing event:
after every 10 events.
If you look at the code FWLiteHistograms.cc
, it also contains the defaults corresponding to the above command line options. Answer the following question:
QUESTION 8.2 - What is the default name of the output file in the executable ?
Exercise 9 - Re-run the above executable with the data MiniAOD
Re-run the above executable with the data MiniAOD file called slimMiniAOD_data_MuEle.root
as follows:
FWLiteHistograms inputFiles=slimMiniAOD_data_MuEle.root outputFile=ZPeak_data.root maxEvents=-1 outputEvery=10
This will create an output histogram ROOT
file called ZPeak_data.root
Then answer the following question.
QUESTION 9 - What is mean mass of the ZPeak for your data MiniAOD?
Exercise 10 - Fireworks - CMS Event Display
Fireworks is the CMS event-display project and cmsShow is the official name of
the executable. Both names are used sometimes interchangeably. With this tool
one can display events for physics.The core of Fireworks is built on top of
the Event Data Model (EDM) and the light version of the software framework
(FWLite). The Event Visualization Environment (EVE) of ROOT is used to manage
3D and 2D views, selection, and user-interaction with the graphics windows.
Several EVE components were developed in a collaboration between the Fireworks
and ROOT teams. The event display operates using simple plugins which are
registered into the system to perform conversion from EDM collections into
their visual representations. As a guiding principle, Fireworks shows only
what is available in the EDM event-data, no reconstruction or result
enhancement is performed internally. Visibility of collection elements can be
filtered via a generic expression.
An instructive introduction to the features of Fireworks is given in this video tutorial
. The video tutorial shows an older version of Fireworks, as some elements of user interface (UI) have changed. With a little bit of browsing through the new UI, you will be able to find all the functionalities. Lots of them are very helpful, and they are nicely explained in the video.
Please be aware that for any issues with fireworks display, first have a look at the twiki WorkBookFireworksHowToFix and then send email to the fireworks support list at fireworks-support@cernSPAMNOT.ch
. Also refer to the latest tutorial on fireworks HERE
.
1. First we will look at the event display from YOURWORKINGAREA/CMSSW_7_3_0_pre1/src
. After you login and do cmsenv
, execute the following command from YOURWORKINGAREA/CMSSW_7_3_0_pre1/src
: (We will look at the collision data and ZMM MC RECO sample that you have
actually used in earlier exercises.)
cmsShow /afs/cern.ch/user/b/benwu/public/CMSDAS/CMSDataAnaSch_RECOZMM730pre1.root
cmsShow /afs/cern.ch/user/b/benwu/public/CMSDAS/CMSDataAnaSch_Data_706_RECO.root
It will pop a window like this:
You will soon realise that it is very slow. This is because the data file is located at CERN on lxplus. The best way would be to scp
the file in YOURWORKINGAREA/CMSSW_7_3_0_pre1/src
. However, the file size could be huge (not in our case though). So let us learn, first, how to copy a few events locally using a config
. You can use this config file to copy events from any file that you can access locally. To achieve this, open the file copy_CMSDAS_cfg.py, select the text entirely and save it in a file with the same name in YOURWORKINGAREA/CMSSW_7_3_0_pre1/src
. Now execute the following command:
cmsRun copy_CMSDAS_cfg.py
This will copy 100 events to a file called RelValZMM.root
that has a size of about 37MB
. If you want to copy less events, you are free to change the number 100
to say 50
in the following line in copy_CMSDAS_cfg.py
:
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(100) )
Nevertheless, do the following to copy the data file:
Comment
the following lines in the copy_CMSDAS_cfg.py
script.
'file:/afs/cern.ch/user/b/benwu/public/CMSDAS/CMSDataAnaSch_RECOZMM730pre1.root'
process.copyAll = cms.OutputModule("PoolOutputModule", fileName = cms.untracked.string("RelValZMM.root") )
and Uncomment
the following commented lines
#'file:/afs/cern.ch/user/b/benwu/public/CMSDAS/CMSDataAnaSch_Data_706_RECO.root',
#process.copyAll = cms.OutputModule("PoolOutputModule", fileName = cms.untracked.string("CollisionData.root") )
Note: You need to use a different CMSSW release for copying the data, otherwise you will see the error message below
19-Nov-2014 18:04:27 CET Initiating request to open file file:/afs/cern.ch/user/b/benwu/public/CMSDAS/CMSDataAnaSch_Data_706_RECO.root
19-Nov-2014 18:04:32 CET Successfully opened file file:/afs/cern.ch/user/b/benwu/public/CMSDAS/CMSDataAnaSch_Data_706_RECO.root
----- Begin Fatal Exception 19-Nov-2014 18:04:36 CET-----------------------
An exception of category 'DictionaryNotFound' occurred while
[0] Calling beginJob
Exception Message:
No REFLEX data dictionary found for the following classes:
edm::Wrapper<std::vector<reco::SecondaryVertexTagInfo> >
edm::Wrapper<std::vector<reco::TrackIPTagInfo> >
std::vector<reco::SecondaryVertexTagInfo>
std::vector<reco::TrackIPTagInfo>
Most likely each dictionary was never generated,
but it may be that it was generated in the wrong package.
Please add (or move) the specification
<class name="whatever"/>
to the appropriate classes_def.xml file.
If the class is a template instance, you may need
to define a dummy variable of this type in classes.h.
Also, if this class has any transient members,
you need to specify them in classes_def.xml.
----- End Fatal Exception -------------------------------------------------
19-Nov-2014 18:04:36 CET Closed file file:/afs/cern.ch/user/b/benwu/public/CMSDAS/CMSDataAnaSch_Data_706_RECO.root
This is because the Data RECO file was generated in different CMSSW release and it is not comparable with CMSSW_7_3_0_pre1. You need to switch to corresponding CMSSW release for it to work. To do so, start a fresh login as you did in Exercises 3
, but replace CMSSW_7_3_0_pre1 with CMSSW_7_0_6.
Run the below command:
cd YOURWORKINGAREA
cmsrel CMSSW_7_0_6
cd CMSSW_7_0_6/src
cmsenv
cp YOURWORKINGAREA/CMSSW_7_3_0_pre1/src/copy_CMSDAS_cfg.py .
Again run the config file for data as:
cmsRun copy_CMSDAS_cfg.py
This will copy 100 collision events to a file called CollisionData.root
that has a size of about 60MB
.
Now we are ready to play with Fireworks
for the event display.
To open the Fireworks
event display for the collision data file, do the following:
cmsShow CollisionData.root
This will open the Fireworks display window as shown in the snapshot below. This window has several parts that can be swapped or undocked for a separate view. Now do the following after the Fireworks windows open.
1. As you see the very first event displayed has an event number 7599197
.
2. On the top left part that says "Summary View/Add Collection"
uncheck all collections EXCEPT Muons
. As you uncheck, notice how the different color coded objects disappear from the main display sub-window that says "Rho Phi"
.
3. Now look at the small independent window on the top right that says Table
on its title bar. In this window select Muons
from the pull down menu. As you see, a row shows up there with details about the single muon object that you see (RED line) in the detector cut-out.
QUESTION 10.1 - What is pT of the only muon that you see in the first event?
In the "Summary View" panel on the left side of the main window, click on the little triangle button to the right of the "Tracks" row
QUESTION 10.2 - How many tracks does the first event have?
You can also open the RelValZMM.root
file and display its events too.
Exercise 11 - Run Fireworks locally from Desktop
As you noticed, first accessing a remote file for cmsShow
makes things run slowly. To overcome that you did the exercise as above. However, despite having the data
and MC
file in YOURWORKINGAREA/CMSSW_7_3_0_pre1/src
, the display is still not fast enough as you are still probably logged into lxplus
or cmslpc
or our local cluster remotely from your laptop. The display can be made the fastest possible if you have the fireworks executable and the data
, MC
ROOT
files all locally on your laptop. In this exercise, we will first download fireworks locally and then run the display. We will also copy the ROOT
files locally to the laptop/desktop.
1. First copy the ROOT
files locally on your desktop. You can either copy whole files directly from the afs area OR
copy the files with 100 events from YOURWORKINGAREA/CMSSW_7_3_0_pre1/src
. We assume that you know how to do that. As an example, here is how you would copy the locally to a Macintosh.
From the afs area
scp USERNAME@lxplus6.cern.ch:/afs/cern.ch/user/b/benwu/public/CMSDAS/CMSDataAnaSch_RECOZMM730pre1.root .
From YOURWORKINGAREA
,
scp USERNAME@SERVER:/YOURWORKINGAREA/CMSSW_7_3_0_pre1/src/RelValZMM.root .
scp USERNAME@SERVER:/YOURWORKINGAREA/CMSSW_7_0_6/src/CollisionData.root .
Replace USERNAME with your name, SERVER with the server you have been using
and YOURWORKINGAREA with the absolute path to your working area.
In case you don't have, you can copy CollisionData.root
file from: /afs/cern.ch/user/b/benwu/public/CMSDAS/CMSDataAnaSch_Data_706_RECO.root
Now we will get the fireworks executable locally. To do this, please follow the instructions in the WorkBookFireworks
twiki. Just download, uncompress and come back here.
Next, copy the CollisionData.root
to directory cmsShow-X.Y
. To open the event display, go into the cmsShow-X.Y
directory and execute the following:
./cmsShow CollisionData.root
QUESTION 11 - What is the size of the file called cmsGeom10.root
in the directory cmsShow-7.1?
Link to SWGuideCMSDataAnalysisSchoolPreExerciseFirstSet
Link to SWGuideCMSDataAnalysisSchoolPreExerciseThirdSet
Link to SWGuideCMSDataAnalysisSchoolPreExerciseFourthSet
Questions/Problems/Suggestions - mailto: malik@fnal.gov
or zhenbinwu@gmail.com
, Phone: +1-630-840-2467.