In addtion to the command available from PBS there is another peice of software called the Scheduler. This software is called MAUI. PBS manages the resources while Maui adds another layer to the resource manager by controlling how jobs are scheduled on those resources. With this software package some additional tools become available to us.Some will be descrbed here. To find out more about Maui see the user manual.
First let’s say you have scheduled a job on the system:
With PBS, we’d use:
[clutest@jake ~]# qstat -a
jake.chem.hamilton.edu:
Req’d Req’d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
—————— ——– ——– ———- —— — — —— —– – —–
31403.jake.chem.hami luke main Pentamer_p — — 8 — 900:0 R 101:1
31417.jake.chem.hami luke main Pentamer_d — — 8 — 900:0 R 97:18
31464.jake.chem.hami luke altix 2260.run 32008 — 8 — 24:00 R 07:28
31466.jake.chem.hami luke altix 2320.run 5687 — 8 — 24:00 R 05:15
31473.jake.chem.hami luke altix 2530.run 13688 — 8 — 24:00 R 10:46
[clutest@jake ~]%
With Maui there is also a command to check the queue called showq.
[clutest@jake ~]% showq
ACTIVE JOBS——————–
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
31464 luke Running 8 11:01:54 Mon Mar 12 07:31:47
31473 luke Running 8 12:59:38 Mon Mar 12 09:29:31
31466 luke Running 8 13:14:57 Mon Mar 12 09:44:50
31474 luke Running 8 15:42:12 Mon Mar 12 12:12:05
31475 luke Running 8 16:58:21 Mon Mar 12 13:28:14
31476 luke Running 8 17:03:37 Mon Mar 12 13:33:30
31477 luke Running 8 17:23:36 Mon Mar 12 13:53:29
31775 luke Running 8 17:59:31 Mon Mar 12 14:29:24
31776 luke Running 8 17:59:55 Mon Mar 12 14:29:48
31777 luke Running 8 18:00:00 Mon Mar 12 14:29:53
31779 luke Running 8 18:00:19 Mon Mar 12 14:30:12
31478 luke Running 8 18:01:09 Mon Mar 12 14:31:02
31752 luke Running 2 21:56:29 Mon Mar 12 18:26:22
31403 luke Running 8 33:06:32:19 Thu Mar 8 14:02:12
31417 luke Running 8 33:10:28:26 Thu Mar 8 17:58:19
15 Active Jobs 114 of 238 Processors Active (47.90%)
5 of 61 Nodes Active (8.20%)
IDLE JOBS———————-
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
31479 luke Idle 8 1:00:00:00 Fri Mar 9 13:54:56
31480 luke Idle 8 1:00:00:00 Fri Mar 9 13:55:06
2 Idle Jobs
BLOCKED JOBS—————-
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 244 Active Jobs: 15 Idle Jobs: 2 Blocked Jobs: 0
Now let’s use maui to check on a job:
[clutest@jake ~]% checkjob 31403
checking job 31403
State: Running
Creds: user:luke group:hamilton class:main qos:qm
WallTime: 4:05:16:22 of 37:12:00:00
SubmitTime: Thu Mar 8 13:59:50
(Time Queued Total: 00:02:22 Eligible: 00:00:01)
StartTime: Thu Mar 8 14:02:12
Total Tasks: 1
Req[0] TaskCount: 1 Partition: DEFAULT
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [main]
Dedicated Resources Per Task: PROCS: 8
NodeCount: 1
Allocated Nodes:
[olympus:1]
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 1
PartitionMask: [ALL]
Flags: HOSTLIST RESTARTABLE PREEMPTEE
Attr: PREEMPTEE
HostList:
[olympus:1]
Reservation ‘31403’ (-4:05:14:10 -> 33:06:45:50 Duration: 37:12:00:00)
PE: 8.00 StartPriority: 100
[clutest@jake ~]%
From this we can find out what computer the job is scheduled on from the HostList argument.
Now lets ask maui to look at a node:
[clutest@jake ~]% checknode olympus
checking node olympus
State: Running (in current state for 00:00:00)
Configured Resources: PROCS: 32 MEM: 30G SWAP: 16G DISK: 722G g03,sander: 9999
Utilized Resources: g03,sander: 9999
Dedicated Resources: PROCS: 16
Opsys: irix6array Arch: [NONE]
Speed: 1.00 Load: 16.000 (MaxLoad: 32.00)
Network: [DEFAULT]
Features: [mercury][irix64][main][g03]
Attributes: [Batch]
Classes: [main 16:32][hamilton 32:32][irix64 32:32][altix 32:32][urbee 32:32][marcy 32:32][gothics 32:32][kilburn 32:32]
[iroquois 32:32][avalanche 32:32][mercury 32:32][linux 32:32][pompeii 32:32]
Total Time: 26:16:05:40 Up: 26:16:05:40 (100.00%) Active: 25:22:34:18 (97.26%)
Reservations:
Job ‘31403’(x1) -4:05:16:41 -> 33:06:43:19 (37:12:00:00)
Job ‘31417’(x1) -4:01:20:34 -> 33:10:39:26 (37:12:00:00)
JobList: 31403,31417
[clutest@jake ~]%
Notice the configured resources line above. The DISK variable is the total scratch space for this machine (node).
Another useful command to use is tracejob. This command is used after a job has run and is no longer on the queue.
[clutest@jake ~]% tracejob 31472
Job: 31472.jake.chem.hamilton.edu
03/12/2007 02:58:20 S Job Modified at request of root@jake.chem.hamilton.edu
03/12/2007 02:58:20 S Job Run at request of root@jake.chem.hamilton.edu
03/12/2007 02:58:20 S Job Modified at request of root@jake.chem.hamilton.edu
03/12/2007 02:58:20 A user=luke group=hamilton jobname=2500.run queue=altix ctime=1173466459 qtime=1173466459
etime=1173466459 start=1173682700 exec_host=avalanche/0 Resource_List.ncpus=8
Resource_List.neednodes=avalanche Resource_List.walltime=24:00:00
03/12/2007 13:51:38 S Exit_status=0 resources_used.cput=65:49:15 resources_used.mem=5269696kb
resources_used.vmem=5536832kb resources_used.walltime=10:53:17
03/12/2007 13:51:38 S dequeuing from altix, state EXITING
03/12/2007 13:51:38 A user=luke group=hamilton jobname=2500.run queue=altix ctime=1173466459 qtime=1173466459
etime=1173466459 start=1173682700 exec_host=avalanche/0 Resource_List.ncpus=8
Resource_List.neednodes=altix Resource_List.walltime=24:00:00 session=9334 end=1173721898
Exit_status=0 resources_used.cput=65:49:15 resources_used.mem=5269696kb
resources_used.vmem=5536832kb resources_used.walltime=10:53:17
[clutest@jake ~]%
This command is useful for figuring out what resources were allocated to running your job.
For more useful maui command’s check out the commands page.