About MAUI

Using MAUI

In addtion to the command available from PBS there is another peice of software called the Scheduler. This software is called MAUI. PBS manages the resources while Maui adds another layer to the resource manager by controlling how jobs are scheduled on those resources. With this software package some additional tools become available to us.Some will be descrbed here. To find out more about Maui see the user manual.

First let’s say you have scheduled a job on the system:

With PBS, we’d use:

[clutest@jake ~]# qstat -a

jake.chem.hamilton.edu:
Req’d  Req’d   Elap
Job ID             Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
—————— ——– ——– ———- —— — — —— —– – —–
31403.jake.chem.hami luke     main     Pentamer_p    —    —    8    —  900:0 R 101:1
31417.jake.chem.hami luke     main     Pentamer_d    —    —    8    —  900:0 R 97:18
31464.jake.chem.hami luke     altix    2260.run    32008   —    8    —  24:00 R 07:28
31466.jake.chem.hami luke     altix    2320.run     5687   —    8    —  24:00 R 05:15
31473.jake.chem.hami luke     altix    2530.run    13688   —    8    —  24:00 R 10:46
[clutest@jake ~]%

With Maui there is also a command to check the queue called showq.  

[clutest@jake ~]% showq
ACTIVE JOBS——————–
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME

31464                  luke    Running     8    11:01:54  Mon Mar 12 07:31:47
31473                  luke    Running     8    12:59:38  Mon Mar 12 09:29:31
31466                  luke    Running     8    13:14:57  Mon Mar 12 09:44:50
31474                  luke    Running     8    15:42:12  Mon Mar 12 12:12:05
31475                  luke    Running     8    16:58:21  Mon Mar 12 13:28:14
31476                  luke    Running     8    17:03:37  Mon Mar 12 13:33:30
31477                  luke    Running     8    17:23:36  Mon Mar 12 13:53:29
31775                  luke    Running     8    17:59:31  Mon Mar 12 14:29:24
31776                  luke    Running     8    17:59:55  Mon Mar 12 14:29:48
31777                  luke    Running     8    18:00:00  Mon Mar 12 14:29:53
31779                  luke    Running     8    18:00:19  Mon Mar 12 14:30:12
31478                  luke    Running     8    18:01:09  Mon Mar 12 14:31:02
31752                  luke    Running     2    21:56:29  Mon Mar 12 18:26:22
31403                  luke    Running     8 33:06:32:19  Thu Mar  8 14:02:12
31417                  luke    Running     8 33:10:28:26  Thu Mar  8 17:58:19

15 Active Jobs     114 of  238 Processors Active (47.90%)
5 of   61 Nodes Active      (8.20%)

IDLE JOBS———————-
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME

31479                  luke       Idle     8  1:00:00:00  Fri Mar  9 13:54:56
31480                  luke       Idle     8  1:00:00:00  Fri Mar  9 13:55:06

2 Idle Jobs

BLOCKED JOBS—————-
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
Total Jobs: 244   Active Jobs: 15   Idle Jobs: 2   Blocked Jobs: 0

 

Now let’s use maui to check on a job:

[clutest@jake ~]%  checkjob 31403
checking job 31403

State: Running
Creds:  user:luke  group:hamilton  class:main  qos:qm
WallTime: 4:05:16:22 of 37:12:00:00
SubmitTime: Thu Mar  8 13:59:50
(Time Queued  Total: 00:02:22  Eligible: 00:00:01)

StartTime: Thu Mar  8 14:02:12
Total Tasks: 1

Req[0]  TaskCount: 1  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [main]
Dedicated Resources Per Task: PROCS: 8
NodeCount: 1
Allocated Nodes:
[olympus:1]
IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Flags:       HOSTLIST RESTARTABLE PREEMPTEE
Attr:        PREEMPTEE
HostList:
[olympus:1]
Reservation ‘31403’ (-4:05:14:10 -> 33:06:45:50  Duration: 37:12:00:00)
PE:  8.00  StartPriority:  100

[clutest@jake ~]%

From this we can find out what computer the job is scheduled on from the HostList argument.

Now lets ask maui to look at a node:

[clutest@jake ~]% checknode olympus
checking node olympus

State:   Running  (in current state for 00:00:00)
Configured Resources: PROCS: 32  MEM: 30G  SWAP: 16G  DISK: 722G  g03,sander: 9999
Utilized   Resources: g03,sander: 9999
Dedicated  Resources: PROCS: 16
Opsys:      irix6array  Arch:      [NONE]
Speed:      1.00  Load:      16.000 (MaxLoad: 32.00)
Network:    [DEFAULT]
Features:   [mercury][irix64][main][g03]
Attributes: [Batch]
Classes:    [main 16:32][hamilton 32:32][irix64 32:32][altix 32:32][urbee 32:32][marcy 32:32][gothics 32:32][kilburn 32:32]
[iroquois 32:32][avalanche 32:32][mercury 32:32][linux 32:32][pompeii 32:32]

Total Time: 26:16:05:40  Up: 26:16:05:40 (100.00%)  Active: 25:22:34:18 (97.26%)

Reservations:
Job ‘31403’(x1)  -4:05:16:41 -> 33:06:43:19 (37:12:00:00)
Job ‘31417’(x1)  -4:01:20:34 -> 33:10:39:26 (37:12:00:00)
JobList:  31403,31417

[clutest@jake ~]%

 Notice the configured resources line above. The DISK variable is the total scratch space for this machine (node).

Another useful command to use is tracejob. This command is used after a job has run and is no longer on the queue.

[clutest@jake ~]% tracejob 31472

Job: 31472.jake.chem.hamilton.edu

03/12/2007 02:58:20  S    Job Modified at request of root@jake.chem.hamilton.edu
03/12/2007 02:58:20  S    Job Run at request of root@jake.chem.hamilton.edu
03/12/2007 02:58:20  S    Job Modified at request of root@jake.chem.hamilton.edu
03/12/2007 02:58:20  A    user=luke group=hamilton jobname=2500.run queue=altix ctime=1173466459 qtime=1173466459
etime=1173466459 start=1173682700 exec_host=avalanche/0 Resource_List.ncpus=8
Resource_List.neednodes=avalanche Resource_List.walltime=24:00:00
03/12/2007 13:51:38  S    Exit_status=0 resources_used.cput=65:49:15 resources_used.mem=5269696kb
resources_used.vmem=5536832kb resources_used.walltime=10:53:17
03/12/2007 13:51:38  S    dequeuing from altix, state EXITING
03/12/2007 13:51:38  A    user=luke group=hamilton jobname=2500.run queue=altix ctime=1173466459 qtime=1173466459
etime=1173466459 start=1173682700 exec_host=avalanche/0 Resource_List.ncpus=8
Resource_List.neednodes=altix Resource_List.walltime=24:00:00 session=9334 end=1173721898
Exit_status=0 resources_used.cput=65:49:15 resources_used.mem=5269696kb
resources_used.vmem=5536832kb resources_used.walltime=10:53:17
[clutest@jake ~]%

This command is useful for figuring out what resources were allocated to running your job.

For more useful maui command’s check out the commands page.