Difference: SWGuideCMSDataAnalysisSchoolPreExerciseFirstSet (1 vs. 4)

Revision 42016-03-15 - eduardo

Line: 1 to 1
 
META TOPICPARENT name="WorkBookExercisesCMSDataAnalysisSchool"

CMS Data Analysis School Pre-Exercises - First Set

Introduction

Welcome to the first set of CMS Data Analysis School (CMSDAS) pre-exercises. The purpose of these exercises is to become familiar with the basic software tools required to perform physics analysis at the school. Please run and complete these exercises. Questions for each exercise are in red font. Post the answers in the online response form available from the course web area - For CMSDAS@LPC 2016, Fermilab. A large amount of additional information about these exercises is available in the twikis that we reference. Please remember that twikis evolve but aim to provide the best information at any time. If, at any time problems are encountered please e-mail LPC contact CMSDASATLPC@fnal.gov with a detailed description of your problem. The instructors will be happy to help you.

The CMSDAS exercises (pre-exercises as well as exercises during the school) are intended to be as generic as possible. However, CMSDAS is held at different CMS collaborating institutes - e.g Fermilab, INFN Pisa, DESY, NTU Taiwan, Bari, etc. Participants are expected to request and obtain local (at the intended school location) computer accounts well in advance of the school start date, to ensure they will be able to work right away. It is actually very important for participants to use the pre-exercises as a setup tool, so we recommend for everyone to use the laptop they intend to bring with them at the school (in general NO computer/laptop will be provided at the school), and to connect to the local computing resources that will be used for the school. In many cases laptops will need to be registered to be used in the local school network, so please make sure you take care of this aspect too.

For all the users who will not be actually attending a school, but going through the CMS DAS exercises on their own the preferred cluster is lxplus at CERN.

There are four SETS of pre-exercises. As outlined above, if you are going through the pre-exercises in preparation to attending a CMS DAS, we strongly recommend using the laptop you intend to bring to the school and logging into the cluster local to the school. Once you have run in the school local cluster you can also try them out in your local cluster/Tier-2/Tier-3 facilities with the necessary changes (accounts, login, local storage, local batch systems etc). It is assumed that there would be no difference in doing these exercises and the instructions should work the same way on any other cluster that has the CMS computing environment.

For the exercises that will be done during the school, instructions on the those exercises will be found on twikis similarly to the pre-exercises.

Obtain a computer account:

To have a CERN account, please have a look at Get Account at CERN. Obtaining a CERN account can be time-consuming. To expedite the process please ask the relevant institutional team leader to perform the necessary "signing" after the online form has been submitted and received for initial processing by the secretariat. If you do not have a scratch area then look here.

NOTE: If you need account elsewhere, you need to contact your local cluster admins and follow their instructions.

To get accounts at few specific sites like Fermilab and DESY have a look at the link below

If you are attending CMSDAS at the LPC, you should do the pre-exercises on the cmslpc cluster using the laptop you intend to bring to the LPC (that you should register for the FNAL network before coming, see link on the left bar of the CMSDAS@LPC2016 indico agenda.

Obtain a Grid Certificate and CMS VO registration

  • A Grid Certificate and CMS VO registration will be needed for the next set of exercises. The registration process can be time-consuming (actions by several people are required), so it is important to start it as soon as possible. There are two main requirements which can be simply summarized: A certificate ensures that you are who you claim to be. A registration in the VO recognizes your (identified by your certificate) as a member of CMS. Use the following link for this: Get Your Grid Certificate and CMSVO. Both are needed to submit jobs on the Grid. Make sure you follow any additional instructions for US-CMS users.

Obtain a github account:

Since Summer 2013, most of the CMS software are hosted on Github. Github is a Git repository web-based hosting service, while Git is a distributed revision control system. In your future analysis work, version control of your analysis code will become a very important task and git will be very useful. A full tutorial of Git is beyond the scope of this exercise. You can learn about git, github in general with the below information:

In order to checkout and develop CMS software, you will need a github account, which is free.

  • In case you don’t have one already, simply go to: https://github.com/join and follow the instructions to create a new account. Make sure you use a username people can recognize you easily or to specify your real name.
  • In case you already have an account you can simply use “the Sign in” dialog and put your username and password. https://github.com/login
  • Recommend: make sure you register in github your ssh key.

NOTE: Legend of colors for this tutorial:

GRAY background for the commands to execute  (cut&paste)
GREEN background for the output sample of the executed commands
BLUE background for the configuration files  (cut&paste)
PINK background for the code (EDAnalyzer etc.)  (cut&paste)

Exercise 1 - Cut and Paste

Note: This exercise is designed to run only on lxplus and cmslpc as copies of the scripts are present there. Elsewhere, one can certainly copy the script and try.

Login to your cluster lxplus6 or cmslpc-sl6 or gridui1.pi.infn.it or gridui2.pi.infn.it or nafhh-cms01.desy.de or nafhh-cms02.desy.de or your local cms cluster. If you are preparing for CMSDAS@LPC2016 please know that the cmslpc-sl6 is the cluster you are supposed to use. By now you should have a FNAL account that you can use to get kerberos credential and follow the instructions on how to log in to the LPC cluster.

Please note: in the following

  • if you are at lxplus, you can do as shown below.
  • if you are at cmslpc, you can copy ( cp) from ~malik, remember you have to copy this file runThisCommand.py!.
  • if you are at UPRM, you can copy ( cp) from ~malik/CMSTutorial, remember you have to copy this file runThisCommand.py!.
  • if you are using Bari, you can copy the file from /afs/cern.ch/user/n/ndefilip/public/runThisCommand.py
  • if you are using KNUT2, you can download the file (see attachments at the end of this page)
<--  
  • If you are using Taipei, you can also copy ( cp) from /home/cmsdas/store/user/cmsdas/2012/PreExerciseFirstSet
  • if you are at nafhh-cms, you can do as shown below.
-->

As the exercises often require copying and pasting from instruction, we will make sure that you will have no problems. To verify if cut and paste to/from a terminal window works, first copy the script runThisCommand.py as follows

ssh -Y USERNAME@lxplus6.cern.ch
USERNAME@lxplus6.cern.ch's password: 

Enter the password and then do:

cp /afs/cern.ch/cms/Tutorials/TWIKI_DATA/runThisCommand.py .

NOTE: For cmslpc at Fermilab, you could try the below commands:

<--/twistyPlugin twikiMakeVisibleInline-->

kinit USERNAME@FNAL.GOV
Password for USERNAME@FNAL.GOV:

Enter the password and then do:

ssh -Y USERNAME@cmslpc-sl6.fnal.gov
cp ~malik/runThisCommand.py .
<--/twistyPlugin-->

NOTE: For UPRM at Mayaguez, you could try the below commands:

<--/twistyPlugin twikiMakeVisibleInline-->

ssh -Y USERNAME@alpha00.hep.uprm.edu
cp ~malik/CMSTutorial/runThisCommand.py .
<--/twistyPlugin-->

and then cut and paste the following and then hit return

./runThisCommand.py "asdf;klasdjf;kakjsdf;akjf;aksdljf;a" "sldjfqewradsfafaw4efaefawefzdxffasdfw4ffawefawe4fawasdffadsfef"

The response should be your username followed by alphanumeric string of characters unique to your username, for example for a user named malik:

success: malik znyvx 

QUESTION 1 - Post the alphanumeric string of characters unique to your username.

If the command is executed without any cut and paste:

somebody@cmslpc11>./runThisCommand.py

the result will likely be:

Error: You must provide the secret key

Pasting incorrectly, the result will likely be:

Error: You didn't paste the correct input string

Running not on cmslpc ( for example locally on a laptop), will result in:

bash: ./runThisCommand.py: No such file or directory

OR:

Unknown user: malik.

Exercise 2 - Simple Edit Exercise

Note: This exercise is designed to run only on lxplus, cmslpc, Pisa, Mayaguez, and Taipei as the scripts are present there. Elsewhere, one can certainly copy the script and try.

Please note: in the following

  • if you are at lxplus, you can do as shown below.
  • if you are at cmslpc, you can copy ( cp) from ~malik
  • if you are at UPRM, you can copy ( cp) from ~malik/CMSTutorial
  • if you are using Bari, you can also copy ( cp) from ~ndefilip/public
  • if you are using Pisa, you can also copy ( cp) from ~boccali/public
  • If you are using Taipei, you can also copy ( cp) from /home/cmsdas/store/user/cmsdas/2012/PreExerciseFirstSet
  • if you are at nafhh-cms, you can do as shown below.

The purpose of this exercise is to ensure that the user can edit files.

cp /afs/cern.ch/cms/Tutorials/TWIKI_DATA/editThisCommand.py .

Then open editThisCommand.py and make sure that the 11th line has # (hash character) as the first character of the line. If not, explicitly change the following three lines:

# Please comment the line below out by adding a '#' to the front of
# the line.
raise RuntimeError, "You need to comment out this line with a #"

to:

# Please comment the line below out by adding a '#' to the front of
# the line.
#raise RuntimeError, "You need to comment out this line with a #"

Save the file and execute the command:

user@cmslpc12> ./editThisCommand.py

If this is successful, the result will be:

success:  malik 0xB888EFD

QUESTION 2 - Paste the line beginning with "success" into the form provided.

If the file has not been successfully edited, an error message will result such as:

Traceback (most recent call last):
  File "./editThisCommand.py", line 11, in ?
    raise RuntimeError, "You need to comment out this line with a #"
RuntimeError: You need to comment out this line with a #

Exercise 3 - Setup a release area CMSSW_7_3_0_pre1

After you obtain a github account, you should also setup your personal information on the server by doing below:
 
git config --global user.name <First Name> <Last Name>
git config --global user.email <Your-Email-Address>
git config --global user.github <Your-Just-Created-GitHub-Account>

Then can you setup a CMSSW release by below:

 
mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
### If you are using csh shell
setenv SCRAM_ARCH slc6_amd64_gcc491 
### If you are using Bash shell
export SCRAM_ARCH=slc6_amd64_gcc491 
cmsrel CMSSW_7_3_0_pre1
cd CMSSW_7_3_0_pre1/src
cmsenv
git cms-init

NOTE: For cmslpc at Fermilab, you would need to run few additional commands, hence the above steps would look like following:

<--/twistyPlugin twikiMakeVisibleInline-->

At Fermilab cmslpc one can use nobackup area linked from your home directory at cmslpc (nobackup -> /uscms_data/d2/YOURUSERNAME) for the exercises.

source /cvmfs/cms.cern.ch/cmsset_default.csh #or .sh for bash
cd ~/nobackup
mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
### If you are using csh shell
setenv SCRAM_ARCH slc6_amd64_gcc481 
### If you are using Bash shell
export SCRAM_ARCH=slc6_amd64_gcc481 
cmsrel CMSSW_7_3_0_pre1
cd CMSSW_7_3_0_pre1/src
cmsenv
git cms-init
<--/twistyPlugin-->

NOTE: For cmsdas01 and cmsdas02 at Bari, you would need to run few additional commands, hence the above steps would look like following:

<--/twistyPlugin twikiMakeVisibleInline-->
source /afs/cern.ch/user/n/ndefilip/public/logincmsdas.sh
cmscvsroot CMSSW
export SCRAM_ARCH=slc6_amd64_gcc481
scram p CMSSW CMSSW_7_3_0_pre1
cd CMSSW_7_3_0_pre1/src
cmsenv

<--/twistyPlugin-->

NOTE: For KNUT2 machines in Korea, you would need to run few additional commands, hence the above steps would look like following:

<--/twistyPlugin twikiMakeVisibleInline-->
source /cvmfs/cms.cern.ch/cmsset_default.sh (or .csh)
### If you are using csh shell
setenv SCRAM_ARCH slc6_amd64_gcc481 
### If you are using Bash shell
export SCRAM_ARCH=slc6_amd64_gcc481 
cmsrel CMSSW_7_3_0_pre1
cd CMSSW_7_3_0_pre1/src
cmsenv

<--/twistyPlugin-->

<-- 
NOTE:  For gridui1/2 at Pisa, you would need to run addtional commands, hence the above steps would look like following: 
<--/twistyPlugin twikiMakeVisibleInline-->

source /afs/pi.infn.it/grid_exp_sw/cms/scripts/setcms.sh (or .csh)
cmscvsroot CMSSW
(once forever: mkdir -p /gpfs/gpfsddn/cms/user/`id`)
cd /gpfs/gpfsddn/cms/user/`id`mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
setenv SCRAM_ARCH slc5_amd64_gcc462
scram p CMSSW CMSSW_5_3_11
cd CMSSW_5_3_11/src
cmsenv
<--/twistyPlugin-->

NOTE: In Taipei, you would follow the following steps:

<--/twistyPlugin twikiMakeVisibleInline-->

source /home/cmsdas/env.csh
mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
setenv SCRAM_ARCH slc5_amd64_gcc462
scram p CMSSW CMSSW_5_3_11
cd CMSSW_5_3_11/src
cmsenv
<--/twistyPlugin-->

NOTE: For nafhh-cms at DESY, you would need to run addtional commands, hence the above steps would look like following:

<--/twistyPlugin twikiMakeVisibleInline-->
ini cmssw
export SCRAM_ARCH=slc5_amd64_gcc462
mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
cmsrel CMSSW_5_3_11
cd CMSSW_5_3_11/src
cmsenv
<--/twistyPlugin-->
-->

Run the following command:

echo $CMSSW_BASE

QUESTION 3 - Paste the result of executing the above command in the form

Note: The directory YOURWORKINGAREA/CMSSW_7_3_0_pre1/src is called your WORKING DIRECTORY.

Exercise 4 - Find data in the DAS ( Data Aggregation Service)

In this exercise we will locate the MC dataset RelValZMM and the collision dataset /DoubleMu/CMSSW_7_0_6-GR_70_V2_AN1_RelVal_zMu2011A-v1/MINIAOD

<--*/SingleMu/Run2012A-recover-06Aug2012-v1/AOD*-->
<--*/SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD*-->
using the Data Aggregation Service (not to be confused with the Data Analysis School in which you are partaking!). Also remember DAS is an improved (faster) database access service previously known as DBS (Dataset Bookkeeping System).

Go to the url DAS and type in the space provided:

dataset release=CMSSW_7_3_0_pre1 dataset=/RelValZMM*/*CMSSW_7_3_0_pre1*/MINIAOD*

<-- dataset release=CMSSW_5_3_6 dataset=/RelValZMM/*CMSSW_5_3_6*/GEN-SIM-RECO -->
<-- dataset release=CMSSW_5_2_3 dataset=*RelValZMM*/GEN-SIM-RECO* -->

This will search for datasets, processed with release CMSSW_7_3_0_pre1, which is named like /RelValZMM*/*CMSSW_7_3_0_pre1*/MINIAOD*. The syntax for searches is found here, with many useful common search patterns under "CMS Queries".

For this query, hit enter on the dataset name /RelValZMM_13/CMSSW_7_3_0_pre1-PU50ns_PRE_LS172_V16-v1/MINIAODSIM and after a few seconds another page will appear.

QUESTION 4.1 - What is the size of this data set? Click on "Sites" to get a list of sites hosting this data. Is this data at FNAL?

Click on the link "Files" to get a list of the root files in this dataset. One of the files it contains should look like this:

       
'/store/relval/CMSSW_7_3_0_pre1/RelValZMM_13/MINIAODSIM/PU50ns_PRE_LS172_V16-v1/00000/5E363364-FD5E-E411-88D9-02163E0104D0.root'
<-- '/store/relval/CMSSW_5_3_6-START53_V14/RelValZMM/GEN-SIM-RECO/v2/00000/08C1D822-F629-E211-A6B1-003048679188.root', -->
<-- '/store/relval/CMSSW_5_2_3/RelValZMM/GEN-SIM-RECO/START52_V5-v1/0043/1011EE9E-2B7A-E111-9349-0018F3D0970C.root', -->
<-- ................................................... -->

If you want to know the name of the dataset from the name of a file, one can go to DAS and type dataset file=/store/relval/CMSSW_7_3_0_pre1/RelValZMM_13/MINIAODSIM/PU50ns_PRE_LS172_V16-v1/00000/5E363364-FD5E-E411-88D9-02163E0104D0.root

<-- dataset file=/store/relval/CMSSW_5_3_6-START53_V14/RelValZMM/GEN-SIM-RECO/v2/00000/08C1D822-F629-E211-A6B1-003048679188.root -->
<-- dataset file=/store/relval/CMSSW_5_2_3/RelValZMM/GEN-SIM-RECO/START52_V5-v1/0043/1011EE9E-2B7A-E111-9349-0018F3D0970C.root -->
in the command line and hit "Enter".

Now we will locate a collisions dataset skim using the keyword search which is sometimes more convenient if you know the dataset you are looking for. In DAS, type dataset=/DoubleMu/CMSSW_7_0_6-GR_70_V2_AN1_RelVal_zMu2011A-v1/MINIAOD

<-- dataset=/SingleMu/Run2012A-recover-06Aug2012-v1/AOD -->
<-- dataset=/SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD -->
and hit Enter. Answer the following question:

QUESTION 4.2 - What release was this dataset collected in? (If you see more than one release, just answer one)

Having set your CMSSW environment one can also search for the dataset /DoubleMu/CMSSW_7_0_6-GR_70_V2_AN1_RelVal_zMu2011A-v1/MINIAOD

<-- /SingleMu/Run2012A-recover-06Aug2012-v1/AOD  -->
by invoking the DAS command in your WORKING DIRECTORY following the directions here (you might have to use -k option like curl -k https://cmsweb.cern.ch/das/cli > das_client.py to get the das_client.py) and clicking on "DAS Command Line Tool". Use your browser to open the "CLI" macro and download it, following the instructions to change the name to das_client.py. Remember to chmod a+x das_client.py. The query we're interested in is:
<--*/SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD*-->

NOTE: For cmslpc at Fermilab, you would need to init your proxy beforehand:

<--/twistyPlugin twikiMakeVisibleInline-->
voms-proxy-init -q -voms cms
<--/twistyPlugin-->

./das_client.py --query="dataset=/DoubleMu/CMSSW_7_0_6-GR_70_V2_AN1_RelVal_zMu2011A-v1/MINIAOD" --format=plain

You will see something like

<--/twistyPlugin twikiMakeVisibleInline-->

<-- 
[jjesus@cmslpc37 src]$ ./das_client.py --query="dataset= /SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD | grep dataset.name" --format=plain

Showing 1-10 out of 1 results, for more results use --idx/--limit options

/SingleMu/Run2012B-TOPMuPlusJets-PromptSkim-v1/AOD

-->

<-- 
 -->
<-- ./das_client.py --query="dataset= /SingleMu/Run2012A-recover-06Aug2012-v1/AOD | grep dataset.name" --format=plain -->
<-- Showing 1-10 out of 1 results, for more results use --idx/--limit options -->
<-- /SingleMu/Run2012A-recover-06Aug2012-v1/AOD -->
<-- 
-->

./das_client.py --query="dataset=/DoubleMu/CMSSW_7_0_6-GR_70_V2_AN1_RelVal_zMu2011A-v1/MINIAOD | grep dataset.name" --format=plain

Showing 1-10 out of 1 results, for more results use --idx/--limit options

/DoubleMu/CMSSW_7_0_6-GR_70_V2_AN1_RelVal_zMu2011A-v1/MINIAOD

<--/twistyPlugin-->

More information about accessing data in the Data Aggregation Service can be found in WorkBookDataSamples

<---

Some links below will help you keep up to date with different physics datasets:

https://twiki.cern.ch/twiki/bin/viewauth/CMS/PdmV2012Analysis

https://twiki.cern.ch/twiki/bin/view/CMS/PdwgMain

https://twiki.cern.ch/twiki/bin/view/CMS/PhysicsSecondaryDatasets

https://twiki.cern.ch/twiki/bin/view/CMS/PhysicsCentralSkims

https://twiki.cern.ch/twiki/bin/view/CMS/Collisions2010Analysis

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback