About PBS

Using PBS

The following documentation serves as a basic introduction to using the PBS software.
RUNNING JOBS
All jobs are submitted and managed with PBS (aka Torque), the Portable Batch System.

INTRODUCTION TO PBS

       PBS is the job queuing and management system used to balance available resources. All jobs must be submitted through PBS in order to ensure that resources are allocated properly. Note that in addition to MERCURY, this is the main queue system for the Hamilton Chemistry Department. By utilizing the mercury queue on this system your jobs will take priority over any Hamilton user utilizing MERCURY resources.

It is imperative that jobs are only run using the PBS queue system. Because we will not be able to process system accounting for jobs run directly from command line, for both commercial or custom-written software, any jobs run manually (i.e. without PBS) will be killed.

PBS uses queues to manage resources. For the consortium the mercury queue will be utilized.

CREATING JOBS

      To run a PBS job, you first prepare a PBS batch script. Each script has two sections. The top section contains parameters for PBS. The bottom section contains the commands used to run your job.

Following is an example batch script. This would be saved in a file called something like testjob.run. Any name will work but if you don’t use the -N option below this filename will be chosen as the job name for you. So choose this name wisely since the queue system only shows the first few characters of it in the qstat display. Pick something that will allow you to distinguish between your jobs.

#PBS -N testjob
#PBS -l nodes=1:ppn=4
#PBS -l mem=2GB
#PBS -q mercury
#PBS -M <your name@yourplace.com>
#PBS -m bea
#PBS -j oe
#PBS -l walltime=24:00:00
#PBS -r y
cd $PBS_O_WORKDIR
g03 testjob.com


All PBS parameters look similar. An explanation of the most common parameters you will use in your scripts:

#PBS -N testjob
Name of the job. If omitted, uses filename.

#PBS -l nodes=1:ppn=8
Number of nodes and processors per node to be used for the job. Use this instead of ncpus. With clusters like olympus ncpus directive will suffice as it would have all the cpu’s that you request. On a beowulf style clusters like herculaneum you’ll have to use this #PBS -l nodes=2:ppn=4 (ie. an 8 cpu job) in order to get it to run properly.

#PBS -l mem=2GB
Amount of memory, in MB or GB, to be used.

#PBS -q mercury
Name of the queue to run job under. Available queue’s are:

mercury: olympus,vesuvius, the herc compute nodes (herc0001-herc0038), herculaneum, and all the pompeii compute nodes (pomp0001-pomp0029). (Suggested default queue).

pompeii: all the pompeii compute nodes (pomp0001-pomp0028)
herc: all the herc compute nodes (herc0001-herc0038)

#PBS -M joeuser@hamilton.edu
Your email address. (Optional) If you leave this line out it defaults to your e-mail account on the system which should get forwarded to the e-mail specified when your account was created.

#PBS -m bea
Email you when job begins, ends and aborts.

#PBS -j oe
Merge PBS output and error files.

#PBS -l walltime=24:00:00
How much time you need the requested resources. All jobs get 24 hours by default.
If you need more than that be sure to specify walltime in your jobs. Use your best over-guesstimate without having your jobs prematurely aborted.

#PBS -r y
Define whether the batch job is rerunnable. Arguments are y or n. (Defaults to yes)

The remaining portion of the script contains commands to run your job. They are similar to what you would type on the command line. Make sure to include a command to change to the directory where you want the output and error files to be stored. For example:

cd $PBS_O_WORKDIR (changes you to the directory where the run file was queue’d from)
g03 testjob.com (command to run)


SUBMITTING JOBS
The qsub command is used to submit jobs to PBS. Once you have prepared your batch script, the job is submitted to PBS by entering:

% qsub scriptname

where scriptname is the actual name (ie. testjob.run) you gave to your batch script.

PBS will respond to you with something like:

77.jake.chem.hamilton.edu

The number preceeding the system name is the job number assigned by PBS to your job.

For more information about qsub,

% man qsub.


CHECKING THE STATUS OF A JOB
The qstat -a command displays information about the PBS queues and the jobs on the queues. It indicates whether your jobs are running or queued, as well as showing the state of all other jobs on the system.

% qstat -a

                                                                                                Req’d  Req’d   Elap
Job ID                     Username Queue Jobname SessID NDS TSK Memory Time  S Time
——————        ——– ——– ———- —— — — —— —– – —–
31403.jake.chem.hami luke     main Pentamer_p    —    —    8    —  900:0 R 100:5
31417.jake.chem.hami luke     main Pentamer_d    —    —    8    —  900:0 R 96:54
31464.jake.chem.hami luke     altix  2260.run    32008   —   8    —  24:00 R 07:28
31466.jake.chem.hami luke     altix  2320.run     5687   —    8    —  24:00 R 05:15
31473.jake.chem.hami luke     altix  2530.run    13688   —   8    —  24:00 R 10:22

% qstat -n

jake.chem.hamilton.edu:
Req’d  Req’d   Elap
Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
——————– ——– ——– ———- —— —– — —— —– – —–
210944.jake.chem.ham tmorrell altix    nh4h2o20pe   9215   —    3    6gb 250:0 R 191:0
avalanche/2
210979.jake.chem.ham tmorrell main     DZ_DZ.run   59842   —    2    6gb 250:0 S 168:3
olympus/3
211087.jake.chem.ham tmorrell altix    1158.run    25954   —    1    2gb 200:0 R 171:2
iroquois/6
211121.jake.chem.ham tmorrell altix    2756.run     1542   —    1    2gb 200:0 R 152:3
saddleback/1

% qme

This will show you the jobs that just you are running.
% qme-n

This will show you the jobs you are running along with the nodes it is running on.

Note: for jobs with many nodes this only shows the first line of nodes from the output.

CANCELLING A JOB
The qdel command is used to delete a job from the queue. It can be used either when the job is running or queued.

% qdel job_number

deletes the job with job_number as it’s identifier. Before running qdel you can use the qstat command to find a job’s number.


CHECKING THE STATUS OF A QUEUE
There are several ways to check the status of a particular queue, or all queues.

To see what jobs have been submitted to the queues, type:

% qstat -a

To see what jobs have been submitted to the queues and the nodes they are on, type:

% qstat -n

To see a summary of the queues, and the number of jobs running and queued for that queue, type:

% qstat -Q

You can also run:

% qstat -q

for similar information.

To see a detailed description of both the total queue resources and the resources currently in use, type:

% qstat -Qf

Note that for any of the commands above you can specify the queue name for information just about that queue. For more information about qstat,

% man qstat.

For more detailed information see the Torque (PBS) documentation


(from the Torque admin manual)

When a batch job is started, a number of variables are introduced into the job’s environment which can be used by the batch script in making decisions, creating output files, etc.  These variables are listed in the table below:

Variable Description
PBS_JOBNAME user specified jobname
PBS_O_WORKDIR user specified jobname
PBS_ENVIRONMENT N/A
PBS_TASKNUM number of tasks requested
PBS_O_HOME home directory of submitting user
PBS_MOMPORT active port for mom daemon
PBS_O_LOGNAME name of submitting user
PBS_O_LANG language variable for job
PBS_JOBCOOKIE job cookie
PBS_NODENUM node offset number
PBS_O_SHELL script shell
PBS_O_JOBID unique pbs job id
PBS_O_HOST host on which job script is currently running
PBS_QUEUE job queue
PBS_NODEFILE file containing line delimited list on nodes allocated to the job
PBS_O_PATH path variable used to locate executables within job script