Run parallel jobs on the compute grid
The Sun Grid Engine (SGE) is a standard interface to submitting jobs to a compute cluster. The department uses SGE to manage all sorts of spare computing resources, including some really high-powered machines that are only accessible through SGE. Using this interface can be a little bit hairy and bootstrapping yourself with the documentation is difficult, hence this HOWTO.
First Things First
Decide on an Architecture
Before you start, you must decide on a machine architecture on which to run your job. Do you want to run on a 32-bit architecture or a 64-bit one? Well, there are pros and cons to each:
- 32-bit
- Pro: no need to recompile your programs.
- Con: there are only six 32-bit SGE hosts.
- 64-bit
- Pro: there are sixty 64-bit SGE hosts, and of course you can cram way more data into main memory on a 64-bit architecture.
- Con: every department machine that you can log into normally, from sixg to wheat to your own workstation, is 32-bit. Therefore all our programs are compiled on 32-bit machines. You must recompile your program and all the libraries it requires in order to run it on a 64-bit machine.
If you chose to use a 32-bit architecture, continue to the next section. If you chose 64-bit, keep reading...
Recompiling for a 64-bit Architecture
... I'll let you know when I figure it out...
Run Your Job
Write a shell script interface to your executable
Submit the job to SGE
ssh sge-- Log into the machine sgecdto the directory from which you want to run your job- Submit the following command, replacing
<STUFF_IN_BRACKETS>with the appropriate arguments:qsub -l arch=<ARCH> -q <QUEUE> -m <NOTIFY> -t 1-<NSLICES>:1 -wd <ROOTPATH> cpp/curveDistSgeMaster.sh
- Arguments&56;
-l arch=<ARCH>--- your choice of eitherlx24-x86to use 32-bit machines orlx24-amd64to use 64-bit machines.-q <QUEUE>--- this argument is optional. If you'd like, you can select a specific job queue to which to add your job. Choices include short.q, long.q, idle.q, and highmem.q. See tstaff's descriptions for more details.-m <NOTIFY>--- this argument is optional. SGE dumps standard out and standard error to files named according to the job id