Glidein is a mechanism by which one or more grid resources (remote machines) temporarily join a local HTCondor pool. The program condor_glidein is used to add a machine to an HTCondor pool. During the period of time when the added resource is part of the local pool, the resource is visible to users of the pool. But, by default, the resource is only available for use by the user that added the resource to the pool.
After glidein, the user may submit jobs for execution on the added resource the same way that all HTCondor jobs are submitted. To force a submitted job to run on the added resource, the submit description file could contain a requirement that the job run specifically on the added resource.
condor_glidein works by installing and executing necessary HTCondor daemons and configuration on the remote resource, such that the resource reports to and joins the local pool. condor_glidein accomplishes two separate tasks towards having a remote grid resource join the local HTCondor pool. They are the set up task and the execution task.
The set up task generates necessary
configuration files and locates proper platform-dependent
binaries for the HTCondor daemons.
A script is also generated that can be used during
the execution task to invoke the proper HTCondor daemons.
These files are copied to the remote resource as necessary.
The configuration variable GLIDEIN_SERVER_URLS
defines a list of locations from which the necessary
binaries are obtained.
Default values cause binaries to be downloaded from the
UW site.
See
section 3.3.24
on page
for a full definition of this configuration variable.
When the files are correctly in place, the execution task starts the HTCondor daemons. condor_glidein does this by submitting an HTCondor job to run under the grid universe. The job runs the condor_master on the remote grid resource. The condor_master invokes other daemons, which contact the local pool's condor_collector to join the pool. The HTCondor daemons exit gracefully when no jobs run on the daemons for a preset period of time.
Here is an example of how a glidein resource appears, similar to how any other machine appears. The name has a slightly different form, in order to handle the possibility of multiple instances of glidein daemons inhabiting a multi-processor machine.
% condor_status | grep denal 7591386@denal LINUX INTEL Unclaimed Idle 3.700 24064 0+00:06:35
As remote grid resources join the local pool, these resources must report to the local pool's condor_collector daemon. Security demands that the local pool's condor_collector list all hosts from which they will accept communication. Therefore, all remote grid resources accepted for glidein must be given HOSTALLOW_WRITE permission. An expected way to do this is to modify the empty variable (within the sample configuration file) GLIDEIN_SITES to list all remote grid resources accepted for glidein. The list is a space or comma separated list of hosts. This list is then given the proper permissions by an additional redefinition of the HOSTALLOW_WRITE configuration variable, to also include the list of hosts as in the following example.
GLIDEIN_SITES = A.example.com, B.example.com, C.example.com HOSTALLOW_WRITE = $(HOSTALLOW_WRITE) $(GLIDEIN_SITES)Recall that for configuration file changes to take effect, condor_reconfig must be run.
If this configuration change to the security settings on the local HTCondor pool cannot be made, an additional HTCondor pool that utilizes personal HTCondor may be defined. The single machine pool may coexist with other instances of HTCondor. condor_glidein is executed to have the remote grid resources join this personal HTCondor pool.
Once the Globus resource has been added to the local HTCondor pool with condor_glidein, job(s) may be submitted. To force a job to run on the Globus resource, specify that Globus resource as a machine requirement in the submit description file. Here is an example from within the submit description file that forces submission to the Globus resource denali.mcs.anl.gov:
requirements = ( machine == "denali.mcs.anl.gov" ) \ && FileSystemDomain != "" \ && Arch != "" && OpSys != ""This example requires that the job run only on denali.mcs.anl.gov, and it prevents HTCondor from inserting the file system domain, architecture, and operating system attributes as requirements in the matchmaking process. HTCondor must be told not to use the submission machine's attributes in those cases where the Globus resource's attributes do not match the submission machine's attributes and your job really is capable of running on the target machine. You may want to use HTCondor's file-transfer capabilities in order to copy input and output files back and forth between the submission and execution machine.