Some common problems and troubleshooting:
If an E(rror) state is displayed for a queue, sge_execd(8) on that host was unable to locate the sge_shepherd(8) exe- cutable on that host in order to start a job. Please check the error logfile of that sge_execd(8) for leads on how to resolve the problem. Please enable the queue afterwards via the -c option of the qmod(1) command manually.in our setup:
grep NODENAME /sge/default/spool/qmaster/messages
for hints of failure.Thu Dec 11 11:32:51 2003|qmaster|cyclades|W|job 3079.1 failed on host node003 general assumedly before job because: can't write script file "job_scripts/3079" wrote only -1 of 893666 bytes: Bad address Thu Dec 11 11:32:51 2003|qmaster|cyclades|E|queue node003.q marked QERROR as result of job 3079's failureYou can clear the E status of all queue nodes with qmod:
[root@cyclades default]# qmod -c node*.q [root@cyclades default]# qstat -f ... node003.q BIP 0/2 0.00 glinux ...