SGE has 3 components: commd, qmaster and schedd.
The qmaster and the schedd are self explanatory but the commd is not.
What the commd means is the "Job Executor", "The communicator of all running jobs".
One each node there is a execd that communicates to the servers commd and places a job into execution when it receives a copy of the job from the qmaster.
How to submit a Job
I. Using the command qsub users can submit a script file.
II. Example of a script file:
vi hello.sh
#!/bin/sh
# This is a simple example of an SGE script
#? -N sample
cd $HOME/
./hello-world
:wq
III. To submit this script type
hostname > qsub hello.sh
You will get output on the screen showing the job id number.hostname (i.e. 0.hostname)
How to check the status of my job/host
I. Using the command qstat users can check the status of their job. Look for the job id number.
hostname > qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
0.hostname hello.sh jackie 00:00:00 R batch
Note: See the man pages for further options to qstat
II. Using the command qhost users can check the status of their job.
hostname > qhost
HOSTNAME ARCH NPROC LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
node001 glinux 2 - 2.0G - 1.9G -
node002 glinux 2 0.01 2.0G 28.9M 1.9G 2.4M
node003 glinux 2 0.02 2.0G 29.8M 1.9G 2.4M
node004 glinux 2 0.00 2.0G 27.2M 1.9G 0.0
node005 glinux 2 0.00 2.0G 27.6M 1.9G 0.0
How to script for a Parallel Run
I. Examples of a parallel script file:
vi mpi-hello.sh
#!/bin/sh
#$ -N MPI_Job
#$ -pe mpich 20
/usr/local/mpich-pgi/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines /home/scoggins/mpi-hello
:wq
You will submit this script the same as you did the first one. The -pe mpich tells sge that this is a parallel
run and that it needs to run the parallel environment to set up the user to use mpi.
If you want to see what parallel environments are available run:
qconf -spl
hostname> qconf -spl
mpi
mpich
If you want to see how the environment is set run:
qconf -sp (i.e. mpich, lam, etc)
hostname > qconf -sp mpich
pe_name mpich
queue_list all
slots 44
user_lists NONE
xuser_lists NONE
start_proc_args /sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args /sge/mpi/stopmpi.sh
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE