[Contents] [Prev] [Next] [End]
This chapter describes how to use the bsub command. Command options are divided into groups with related functions. Topics covered in this chapter are:
The options to the bsub command related to job checkpointing and migration are described in 'Checkpointing and Migration'.
When a batch job completes or exits, LSF Batch by default sends you a job report by electronic mail. The report includes the standard output (stdout) and error output (stderr) of the job. The output from stdout and stderr are merged together in the order printed, as if the job was run interactively. The default standard input (stdin) file is the null device (for UNIX systems, /dev/null).
If you want mail sent to another user, use the -u username option to the bsub command. Mail associated with the job will be sent to the named user instead of to you.
If you do not want output to be sent by mail, you can specify stdout and stderr files. You can also specify the standard input file if the job needs to read input from stdin. For example:
% bsub -q night -i job_in -o job_out -e job_err myjob
submits myjob to the night queue. The job reads its input from file job_in. Standard output is stored in file job_out, and standard error is stored in file job_err. If you specify a -o outfile argument and do not specify a -e errfile argument, the standard output and error are merged and stored in outfile.
The output file created by the -o option to the bsub command normally contains job report information as well as the job output. This information includes the submitting user and host, the execution host, the CPU time (user plus system time) used by the job, and the exit status. If you want to separate the job report information from the job output, use the -N option to specify that the job report information should be sent by email.
The output files specified by the -o and -e options are created on the execution host. See 'Remote File Access' for an example of copying the output file back to the submission host if the job executes on a file system that is not shared between the submission and execution hosts.
If you need to explicitly specify resource requirements for your job, use the -R option to the bsub command. For example:
% bsub -R "swp > 15 && hpux order[cpu]" myjob
runs myjob on an HP-UX host that is lightly loaded (CPU utilization) and has at least 15 megabytes of swap memory available. See 'Resource Requirement Strings' for a complete discussion of resource requirements.
You do not have to specify resource requirements every time you submit a job. The LSF administrator may have already configured the resource requirements for your jobs, or you can put your executable name together with its resource requirements into your personal remote task list. The bsub command automatically uses the resource requirements of the job from the remote task lists. See 'Configuring Resource Requirements' for more information about displaying task lists and putting tasks into your remote task list.
When a job is dispatched, the system assumes that the resources that the job consumes will be reflected in the load information. However, many jobs often do not consume the resources they require when they first start. Instead, they will typically use the resources over a period of time. For example, a job requiring 100 megabytes of swap is dispatched to a host having 150 megabytes of available swap. The job starts off initially allocating 5 megabytes, gradually increasing the amount consumed to 100 megabytes over a 30-minute period. During this period, another job requiring more than 50 megabytes of swap should not be started on the same host to avoid overcommitting the resource.
When submitting a job, you can specify the amount of resources to be reserved through the resource usage section of resource requirement string argument to the bsub command. The syntax of the resource reservation in the rusage section of resource requirement string is:
The res parameter can be any load index. The value parameter is the initial reserved amount. If res or value is not given, the default is to not reserve that resource.
The duration parameter is the time period within which the specified resources should be reserved. It is specified in minutes by default. If the value is followed by the letter 'h', it is specified in hours. For example, 'duration=30' and 'duration=2h' specify a duration of 30 minutes and two hours respectively. If duration is not specified, the default is to reserve the total amount for the lifetime of the job.
The decay parameter indicates how the reserved amount should decrease over the duration. A value of 1, 'decay=1', indicates that system should linearly decrease the amount reserved over the duration. The default decay value is 0, which causes the total amount to be reserved for the entire duration. Values other than 0 or 1 are unsupported. If duration is not specified decay is ignored.
When deciding whether to schedule a job on a host, the LSF Batch system considers the reserved resources of jobs that have previously started on that host. For each load index, the amount reserved by all jobs on that host is summed up and subtracted (or added if the index is increasing) from the current value of the resources as reported by the LIM to get amount available for scheduling new jobs:
available amount = current value - reserved amount for all jobs
Reservation of the resources mem and swap are handled as special cases. For these resources, the run time usage is used to determine the amount to reserve (see 'Monitoring Resource Consumption of Jobs'). The reserved amount is the specified amount minus the run time usage. The duration and decay parameters are ignored for these resources.
% bsub -R "rusage[swap=50]" my_job
will reserve 50 megabytes of swap for the job.
% bsub -R "rusage[tmp=30:duration=30:decay=1]" my_job
will reserve 30 megabytes of /tmp space for the job. As the job runs, the amount reserved will decrease at approximately 1 megabyte/minute such that the reserved amount is 0 after 30 minutes.
The queue level resource requirement parameter RES_REQ may also specify the resource reservation. If a queue reserves certain amount of a resource, you cannot use the -R option of the bsub command to reserve a greater amount of that resource. For example, if the output of bqueues -l command contains:
the following submission will be rejected since the requested amount of certain resource(s) exceeds queue's specification:
% bsub -R "rusage[mem=50:swp=100]" my_job
The amount of resources reserved on each host can be viewed through the -l option of the bhosts command.
If you want to restrict the set of candidate hosts for running your batch job, use the -m option to bsub.
% bsub -q idle -m "hostA hostD hostB" myjob
This command submits myjob to the idle queue and tells LSF Batch to choose one host from hostA, hostD and hostB to run the job. All other LSF Batch scheduling conditions still apply, so the selected host must be eligible to run the job.
If you have applications that need specific resources, it is more flexible to create a new boolean resource and configure that resource for the appropriate hosts in the LSF cluster. This must be done by the LSF administrator. If you specify a host list using the -m option to bsub, you must change the host list every time you add a new host that supports the desired resources. By using a boolean resource, the LSF administrator can add, move or remove resources without forcing users to learn about changes to resource configuration.
When several hosts can satisfy the resource requirements of a job, the hosts are ordered by load. However, in certain situations it may be desirable to override this behaviour to give preference to specific hosts, even if they are more heavily loaded.
For example, you may have licensed software which runs on different groups of hosts, but prefer to run on a particular host group because the jobs will finish faster, thereby freeing the software license to be used by other jobs.
Another situation arises in clusters consisting of dedicated batch servers and desktop machines which can also run jobs when no user is logged in. You may prefer to run on the batch servers and only use the desktop machines if no server is available.
The -m option of the bsub command allows you to specify preference by using '+' after the hostname. The special hostname, others, can be used to refer to all the hosts that are not explicitly listed. For example:
% bsub -R "solaris && mem> 10" -m "hostD+ others" myjob
will select all solaris hosts having more than 10 megabytes of memory available. If host hostD satisfies this criteria, it will be picked over any other host which otherwise meets the same criteria. If hostD does not satisfy the criteria, the least loaded host among the others will be selected. All the other hosts are considered as a group and are ordered by load.
You can specify different levels of preference by specifying a number after the '+'. The larger the number, the higher the preference for that host or host group. For example:
% bsub -m "groupA+2 groupB+1 groupC" myjob
gives first preference to hosts in groupA, second preference to hosts in groupB and last preference to those in groupC. The ordering within a group is still determined by the load. You can use the bmgroup command to display the host groups configured in the system.
A queue may also define the host preference for jobs via HOSTS parameter. The queue specification is ignored if a job specifies its own preference.
You can also exclude a host by specifying a resource requirement using hname resource:
% bsub -R "hname!=hostb && type==sgi6" myjob
Resource limits are constraints you or your LSF administrator can specify to limit the use of resources. Jobs that consume more than the specified amount of a resource are signalled or have their priority lowered.
Resource limits can be specified either at the queue level by your LSF administrator or at the job level when you submit a job. Resource limits specified at the queue level are hard limits while those specified with job submission are soft limits. See setrlimit(2) man page for concepts of hard and soft limits.
The following resource limits can be specified to the bsub command:
bsub -c 10/DEC3000 myjob
Some batch jobs require resources that LSF does not directly support. For example, a batch job may need to reserve a tape drive or check for the availability of a software license.
The -E pre_exec_command option to the bsub command specifies an arbitrary command to run before starting the batch job. When LSF Batch finds a suitable host on which to run a job, the pre-execution command is executed on that host. If the pre-execution command runs successfully, the batch job is started.
An alternative to using the -E pre_exec_command option is for the LSF administrator to set up a queue level pre-execution command. See 'Queue-Level Pre-/Post-Execution Commands' of the LSF Administrator's Guide for more information.
The standard input, output and error files for the pre-execution command are opened to the same files as for the job. Standard input and output from the pre-execution command cannot be redirected.
The pre-execution command is run under the same user ID, environment, and home and working directories as the batch job. If the pre-execution command is not in your normal execution path, the full path name of the command must be specified.
For parallel batch jobs, the pre-execution command is run on the first selected host.
The pre-execution command returns information to LSF Batch using the exit status. If the pre-execution command exits with non-zero status, the batch job is not dispatched. The job goes back to the PEND state, and LSF Batch tries to dispatch another job to that host. The next time LSF Batch tries to dispatch jobs this process is repeated.
LSF Batch assumes that the pre-execution command runs without side effects. For example, if the pre-execution command reserves a software license or other resource, you must take care not to reserve the same resource more than once for the same batch job.
The following example shows a batch job that requires a tape drive. The tapeCheck program is a site specific program that exits with status zero if the specified tape drive is ready, and one otherwise:
% bsub -E "/usr/local/bin/tapeCheck /dev/rmt0l" myjob
Some batch jobs depend on the results of other jobs. For example, a series of jobs could process input data, run a simulation, generate images based on the simulation output, and finally, record the images on a high-resolution film output device. Each step can only be performed when the previous step completes and all subsequent steps must be aborted if any step fails.
The -w depend_cond option to the bsub command specifies a dependency condition, which is a logical expression based on the execution states of preceding batch jobs. When the depend_cond expression evaluates to TRUE, the batch job can be started. Complex conditions can be written using the logical operators '&&' (AND), '||' (OR), '!' (NOT) and parentheses '()'.
If there is a space character, a logic operator or parentheses in the expression string, the string must be enclosed in single or double quotes (' or ") to prevent the shell from interpreting the special characters.
Batch jobs are identified by job ID number or job name. The job ID number is displayed by the bsub command when the job is submitted. The job name is a string specified by the -J job_name option.
In job dependency expressions, numeric job names must be enclosed in quotes.
Job names refer to jobs submitted by the same user. If more than one of your jobs has the same name, the condition is tested on the last job submitted with that name.
A wildcard character '*' can be specified at the end of a job name to indicate all jobs matching the name. For example, jobA* will match jobA, jobA1, jobA_test, jobA.log etc. There must be at least one match.
The conditions that can be tested are:
Specifying only jobID or jobName is equivalent to done(jobID | jobName). Note that a numeric job name should be doubly quoted, e.g. -w "'210'", since the Unix shell treats -w "210" the same as -w 210.
If any one of the depended batch jobs is not found, bsub fails and the job is not submitted.
The following are examples of job dependency conditions:
done(312) && (started(Job2)||exit(Job3))
The submitted job will not start until job 312 has completed successfully, and either the job named Job2 has started or the job named Job3 has terminated abnormally.
1532 || jobName2 || ended(jobName3*)
The submitted job will not start until either job 1532 has completed, the job named jobName2 has completed, or all jobs with names beginning with jobName3 have finished.
If you require more extensive dependencies, for example, calendar or event dependencies, you may want to examine the LSF JobScheduler component of LSF Suite. See the LSF JobScheduler User's Guide for further information.
LSF is usually used in networks with shared file space. When shared file space is not available, LSF can copy needed files to the execution host before running the job, and copy result files back to the submission host after the job completes.
The -f "[lfile op [rfile]]" option to the bsub command copies a file between the submission host and the execution host. lfile is the file name on the submission host, and rfile is the name on the execution host. op is the operation to perform on the file. lfile and rfile can be absolute or relative file path names. If one of the files is not specified, it defaults to the other, which must be given.
The -f option may be repeated to specify multiple files.
op must be surrounded by white space. The possible values for op are:
You must include lfile with op, otherwise it will result in a syntax error. When rfile is not given, it is assumed to be the same as lfile.
If the input file specified with the -i argument to bsub is not found on the execution host, the file is copied from the submission host using LSF's remote file access facility and is removed from the execution host after the job finishes.
The output files specified with the -o and -e arguments to bsub are created on the execution host, and are not copied back to the submission host by default. You can use the remote file access facility to copy these files back to the submission host if they are not on a shared file system. For example, the following command stores the job output in the job_out file and copies the file back to the submission host:
% bsub -o job_out -f 'job_out <' myjob
If the submission and execution hosts have different directory structures, you must ensure that the directory where rfile and lfile will be placed exists. LSF tries to change the directory to the same path name as the directory where the bsub command was run. If this directory does not exist, the job is run in your home directory on the execution host.
You should specify rfile as a file name with no path when running in non-shared file systems; this places the file in the job's current working directory on the execution host. This way the job will work correctly even if the directory where the bsub command is run does not exist on the execution host. Be careful not to overwrite an existing file in your home directory.
For example, to submit myjob to LSF Batch, with input taken from the file /data/data3 and the output copied back to /data/out3, run the command:
% bsub -f "/data/data3 > data3" -f "/data/out3 < out3" myjob data3 out3
To run the job batch_update, which updates the batch_data file in place, you need to copy the file to the execution host before the job runs and copy it back after the job completes:
% bsub -f "batch_data <>" batch_update batch_data
LSF Batch uses the lsrcp(1) command to transfer files. lsrcp contacts the RES on the remote host to perform the file transfer. If the RES is not available, rcp(1) is used. Because LSF client hosts do not run the RES daemon, jobs that are submitted from client hosts should only specify the -f option to bsub if rcp is allowed. You must set up the permissions for rcp if account mapping is used.
If you do not want LSF Batch to start your job immediately, use the bsub -b option to specify the time after which the job should be dispatched.
% bsub -b 5:00 myjob
The submitted job remains pending until after the local time on the LSF master host reaches 5 A.M. You can also specify a time after which the job should be terminated with the -t option to bsub. The command
% bsub -b 11:12:5:40 -t 11:12:20:30 myjob
submits myjob to the default queue to start after November 12 at 05:40 A.M. If the job is still running on Nov 12 at 8:30 P.M., it is killed.
LSF Batch can allocate more than one host or processor to run a job and automatically keeps track of the job status, while a parallel job is running. To submit a parallel job, use the -n option of bsub:
% bsub -n 10 lsmake
This command submits lsmake as a parallel job. The job is started when 10 job slots are available.
For parallel jobs, LSF Batch only starts one controlling process for the batch job. This process is started on the first host in the list of selected hosts. The controlling process is responsible for starting the actual parallel components on all the hosts selected by LSF Batch.
LSF Batch sets a number of environment variables for each batch job. The variable LSB_JOBID is set to the LSF Batch job ID number as printed by bsub. The LSB_HOSTS variable is set to the names of the hosts running the batch job. For a sequential job, LSB_HOSTS is set to a single host name. For a parallel batch job, LSB_HOSTS contains the complete list of hosts that LSF Batch has allocated to that job. Parallel batch jobs must get the list of hosts from the LSB_HOSTS variable and start up all of the job components on the allocated hosts.
In the lsmake example above, LSF Batch starts lsmake on the first host. lsmake reads the LSB_HOSTS environment variable to get the list of hosts and uses the RES to execute subtasks on those hosts.
LSF includes scripts for running PVM, P4, and MPI parallel programs as batch jobs. See 'Parallel Jobs' and the pvmjob(1), p4job(1), and mpijob(1) manual pages for more information.
The following features support parallel jobs running through the LSF Batch system.
When submitting a parallel job that requires multiple processors, you can specify the minimum number and maximum number of the processors using -n option to the bsub command. The syntax of the -n option is:
bsub -n min_proc[,max_proc] <other bsub options>
If max_proc is not specified then it is assumed to be equal to min_proc. For example:
% bsub -n 4,16 myjob
At most, 16 processors can be allocated to this job. If there are less than 16 processors eligible to run the job, this job can still be started as long as the number of eligible processors is greater than 4. Once the job gets started, no more processors will be allocated to it even though more may be available later on.
If the specified maximum number is greater than the value of PROCLIMIT defined for the queue to which the job is submitted, the job will be rejected.
Sometimes you need to control how the selected processors for a parallel job are distributed across the hosts in the cluster. You are able to specify "select all the processors for this parallel batch job on the same host", or "do not chose more than one processor on one host" by using the span section in the -R option string. For example:
% bsub -n 4 -R "span[hosts=1]" my_job
This job should be dispatched to a multiprocessor that has at least 4 processors currently eligible to run the 4 components of this job.
% bsub -n 4 -R "span[ptile=1]" my_job
This job should be dispatched to 4 hosts even though some of the 4 hosts may have more than one processor currently available.
The queue may also define the locality for parallel jobs using RES_REQ parameter. The queue specification is ignored if your job specifies its own locality.
The scheduling of parallel jobs supports the notion of processor reservation. Parallel jobs requiring a large number of processors can often not be started if there are many lower priority sequential jobs in the system. There may not be enough resources at any one instant to satisfy a large parallel job, but there may be enough to allow a sequential job to be started. With the processor reservation feature the problem of starvation of parallel jobs can be reduced.
When a parallel job cannot be dispatched because there aren't enough execution slots to satisfy its minimum processor requirements, the currently available slots will be reserved for the job. These reserved job slots are accumulated until there are enough available to start the job. When a slot is reserved for a job it is unavailable to any other job.
To use this feature, a queue must have processor reservation policy enabled through the SLOT_RESERVE parameter (see 'Processor Reservation for Parallel Jobs' of the LSF Administrator's Guide). To avoid deadlock situations, the period of reservation is specified through the MAX_RESERVE_TIME parameter. The system will accumulate reserved slots for a job until MAX_RESERVE_TIME minutes and if an insufficient number have been accumulated, all slots are freed and made available to other jobs. The MAX_RESERVE_TIME parameter takes effect from the start of the first reservation for a job and a job can go through multiple reservation cycles before it accumulates enough slots to be actually started.
Reserved slots can be displayed with the bjobs command. The number of reserved slots can be displayed with the bqueues, bhosts, bhpart, and busers commands. Look in the RSV column.
By default LSF Batch copies the environment of the job from the submission host when the job is submitted. The environment is recreated on the execution host when the job is started. This is convenient, in many cases, because the job runs as if it were run interactively on the submission host.
There are cases where you want to use a platform specific or host specific environment to run the job, rather than using the same environment as on the submission host. For example, you may want to set up different search paths on the execution host.
The -L shell option to the bsub command causes LSF Batch to emulate a login on the execution host before starting your job. This makes sure that the login start-up files (.profile for /bin/sh, or .cshrc and .login for /bin/csh) are sourced before the job is started. The shell argument specifies the login shell to use.
% bsub -L /a/b/shell myjob Job <1234> is submitted to default queue <normal>.
This tells LSF Batch to use /a/b/shell as the login shell to reinitialize the environment.
This does not affect the shell under which the job is run. When a login shell is specified with the -L shell option to the bsub command, that shell is only used as a login shell to set the environment. The job is run using /bin/sh, unless you specify otherwise as described in 'Running a Job Under a Particular Shell'. For example, if your job script is written in /bin/sh and your regular login shell is /bin/csh, you can run your job under /bin/sh but use /bin/csh to reinitialize the job environment by sourcing your .cshrc and .login files.
This section lists some other bsub options. For details on these options see the bsub(1) manual page.
If bsub is run without giving a command to submit, it reads job command lines from the standard input. If the standard input is a controlling terminal, you are prompted with bsub> for each line. For example:
% bsub -q simulation bsub> cd /work/data/myhomedir bsub> myjob arg1 arg2 ...... bsub> rm myjob.log bsub> ^D Job <1234> submitted to queue <simulation>.
In this case, the three command lines are submitted to LSF Batch and run as a /bin/sh script. Note that only valid /bin/sh command lines are acceptable in this case. Here is another example:
% bsub -q simulation < command_file Job <1234> submitted to queue <simulation>.
command_file must contain /bin/sh command lines.
On NT systems, commands must be specified using batch file (BAT) syntax. For example:
C:\> bsub -q simulation bsub> cd \\server\data\myhomedir bsub> myjob arg1 arg2 ...... bsub> del myjob.log bsub> ^Z Job <1234> submitted to queue <simulation>.
You can specify job submission options in the script read from the standard input by the bsub command using lines starting with '#BSUB':
% bsub -q simulation bsub> #BSUB -q test bsub> #BSUB -o outfile -R "mem>10" bsub> myjob arg1 arg2 bsub> #BSUB -J simjob bsub> ^D Job <1234> submitted to queue <simulation>.
There are a few things to note:
As a second example, you can redirect a script to the standard input of the bsub command:
% bsub < myscript Job <1234> submitted to queue <test>.
The myscript file contains job submission options as well as command lines to execute. When the bsub command reads a script from its standard input, the script file is actually spooled by the LSF Batch system; therefore, the script can be modified right after bsub returns for the next job submission. When the script is specified on the bsub command line, the script is not spooled:
% bsub myscript Job <1234> submitted to default queue <normal>.
In this case the command line myscript is spooled by LSF Batch, instead of the contents of the myscript file. Later modifications to the myscript file can affect the job's behaviour.
The bsub command interprets embedded options only if the script is supplied as the stdin of its command line. When the script is specified on the bsub command line, as is the case with the above example, the options embedded in the script file are ignored.
By default, LSF runs job scripts using the /bin/sh shell. You can specify the shell under which the job is run. This is done by specifying an interpreter in the first line of the script.
% bsub bsub> #!/bin/csh -f bsub> set coredump=`ls |grep core` bsub> if ( "$coredump" != "") then bsub> mv core core.`date | cut -d" " -f1` bsub> endif bsub> myjob bsub> ^D Job <1234> is submitted to default queue <normal>.
The bsub command must read the job script from the standard input to set the execution shell.
If you do not specify a shell in the script, the script is run using /bin/sh. If the first line of the script starts with a '#' not immediately followed by a '!', then /bin/csh is used to run the job. For example:
% bsub bsub> # This is a comment line. This tells the system to use /bin/csh to bsub> # interpret the script. bsub> bsub> setenv DAY `date | cut -d" " -f1` bsub> myjob bsub> ^D Job <1234> is submitted to default queue <normal>.
If running jobs under a particular is a system wide or queue wide requirements, you can ask your system administrator to configure the shell as the job starter of your queue. You can find out if your queue has a job starter configured or not by running bqueues -l command.
See 'Using A Job Starter' of the LSF Administrator's Guide for more details.
LSF Batch provides a GUI for submitting jobs. The main window of xbsub is shown in 'Figure 3. xbsub Job Submission Window'. All the job submission options can be selected using xbsub.
Detailed parameters can be set by clicking the 'Advanced' button. Figure 10 shows the resulting window.
Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.