Run parallel jobs on the compute grid

From VrlWiki
Revision as of 15:20, 30 March 2009 by Jadrian Miles (talk | contribs) (New page: The '''Sun Grid Engine''' (SGE) is a standard interface to submitting jobs to a compute cluster. The department uses SGE to manage all sorts of spare computing resources, including some r...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The Sun Grid Engine (SGE) is a standard interface to submitting jobs to a compute cluster. The department uses SGE to manage all sorts of spare computing resources, including some really high-powered machines that are only accessible through SGE. Using this interface can be a little bit hairy and bootstrapping yourself with the documentation is difficult, hence this HOWTO.

First Things First

Decide on an Architecture

Before you start, you must decide on a machine architecture on which to run your job. Do you want to run on a 32-bit architecture or a 64-bit one? Well, there are pros and cons to each:

32-bit
Pro: no need to recompile your programs.
Con: there are only six 32-bit SGE hosts.
64-bit
Pro: there are sixty 64-bit SGE hosts, and of course you can cram way more data into main memory on a 64-bit architecture.
Con: every department machine that you can log into normally, from sixg to wheat to your own workstation, is 32-bit. Therefore all our programs are compiled on 32-bit machines. You must recompile your program and all the libraries it requires in order to run it on a 64-bit machine.

If you chose to use a 32-bit architecture, continue to the next section. If you chose 64-bit, keep reading...

Recompiling for a 64-bit Architecture

... I'll let you know when I figure it out...

Run Your Job

Write a shell script interface to your executable

Submit the job to SGE

  1. ssh sge -- Log into the machine sge
  2. cd to the directory from which you want to run your job
  3. Submit the following command, replacing <STUFF_IN_BRACKETS> with the appropriate arguments:
    qsub -l arch=<ARCH> -q <QUEUE> -m <NOTIFY> -t 1-<NSLICES>:1 -wd <ROOTPATH> cpp/curveDistSgeMaster.sh
    Arguments&56;
    • -l arch=<ARCH> --- your choice of either lx24-x86 to use 32-bit machines or lx24-amd64 to use 64-bit machines.
    • -q <QUEUE> --- this argument is optional. If you'd like, you can select a specific job queue to which to add your job. Choices include short.q, long.q, idle.q, and highmem.q. See tstaff's descriptions for more details.
    • -m <NOTIFY> --- this argument is optional. SGE dumps standard out and standard error to files named according to the job id

Troubleshooting

See Also