Running parallel jobs with MPI on COS nodes on COE cluster

The example below (classical computation of pi using numerical integration) shows steps you need to run a job in parallel on the cluster. Please follow exactly. This example works as of 11/11/09.
You can also try to use documentation at COS research computing website amd COE cluster website for more information.

You need to have two Unix terminals open (Cygwin from Windows will work). I will assume you have two terminals open: BLACK and RED.
Your username below will be your ONID username unless you already have a cluster account.

  • (BLACK) (ONE TIME ONLY FOR NEW USERS) Log in to the 64-submit node of the cluster

    ssh username@submit64.eecs.oregonstate.edu

    and change your password: use

    htpasswd -n username

    This will prompt for a new password (twice) and then print their user-name and a password hash (ie msander:6c9orYURMJ2sk).
    Copy that line and send an email to Clif Johnson cjohnson@science.oregonstate.edu so he can help to change it for you.
    Log out. (Ctrl-D or exit)
  • (BLACK) Log in to the 64-bit part of the cluster

    ssh username@submit64.eecs.oregonstate.edu

    username:ala0oFXfKSG/E
  • (BLACK) Set-up environment (You should include this line in your .cshrc file)

    source /usr/local/apps/sge/settings.csh

    Your .cshrc file is the one you should have in your home directory on the cluster.
    Now either type the line above or just

    source .cshrc

    (I am assuming you created the file .cshrc before).
  • (BLACK) Create a separate folder for this project: mydirpi

    mkdir mydirpi
    cd mydirpi

  • (RED) Get the code and the submission script, you can save them to a local directory you are working with.
  • (RED) Copy the files to the cluster account and to that folder,

    scp mypi.f username@submit64.eecs.oregonstate.edu:mydirpi
    scp mypi_submit username@submit64.eecs.oregonstate.edu:mydirpi

  • (BLACK) Compile (and link) your example:

    mpif77 -o mypi mypi.f

  • Understand your submission script ...

    more mypi_submit In particular, find the queue to which you are submiitting ... which is em64t. Another one to try is amd64-low. Other queues are loaded with people running big jobs so you will likely not get on.

  • (BLACK) Submit your job

    qsub mypi_submit

  • (BLACK) Check your status:

    qstat | grep yourname

  • (BLACK) Check results: should be in the file ...

    cat mypi_job.log

    and they should look like those

If you want to find out more about how to submit, or delete jobs form the queue, i.e. about qsub, qdel, you can type

man qsub
man qdel