Pleiades Cluster
Usage Instructions:
Basic system configuration:
Pleiades is a 64-core cluster. The cluster contains 8 nodes,
each with 8 Intel Xeon E5410 cores @ 2.33GHz, and with 4 GB of shared RAM.
The compute nodes are labelled n000[0-7]. The MPI implementation
used is OpenMPI.
Logging in: ssh into pleiades.tricity.wsu.edu
with your login credentials. For the Pullman students, you should have
received an email from the Pleiades system admin with account
details. For the Tricities students, you should be able to login using
your WSUTC credentials.
Compiling an MPI program:
To compile your code, simply use "mpicc -o <executable_name>
<program_name(s)>". For instance, "mpicc -o
helloworld helloworld.c"
Running an MPI job:
The cluster uses the SLURM workload manager. You need to use this
manager to launch jobs on the cluster. Using this manager ensures that each
job is pushed onto the batch queue system. The manager will figure out what
nodes and what cores to use. You just specify the number of nodes, number of
processes, and the executable (along with any related arguments).
Most useful SLURM commands:
squeue: to monitor the status of the queue
sbatch: to launch a job
scancel: to cancel a job
Here is how to run an MPI job. Let's pretend that the present working folder
is the one that contains the executable "helloworld".
Step 1) You need to create
a job script first (or modify existing one as needed) with these two lines:
#!/bin/sh
mpirun -np <number_of_processes> <executable file's
absolute path>
Here is an example job script: sub.sh.txt
Please make sure you give read and execute permissions to the world. Command
"chmod 755 sub.sh". Also fix the path of the executable as per your
directory structure. The above example job script is configured to run the
MPI helloworld program on 8 processors (i.e., p=8).
Step 2) Next, you need to
use the SLURM command sbatch to launch the job script.
The command line syntax is as follows:
sbatch -N<number_of_nodes> sub.sh
For example, if you want to run the MPI helloworld program on 8
processes, on up to 4 compute nodes, then you will say
"sbatch -N4 sub.sh" while specifying "-np 8" as the mpirun
argument within sub.sh.
(Notes: there is no whitespace between N and the
<number_of_nodes> argument. Also, it is your responsibility to
make sure what you specify in the -np argument of sub.sh is >= the
-N argument. Finally, -N's argument cannot exceed 8 since the cluster
has only 8 nodes. More notes below.)
As soon as you launch this job, the command will return with a job id
number. For instance, "Submitted batch job 9" means the job id is
9. Each time you launch a job you will get a different job id number.
You can monitor the job status by performing the following command:
squeue
This will show your job (if its still running) and also mention what compute
nodes are being used. If for some reason you want to cancel a job when the
job is still running, the command to use is:
scancel <jobid>
After completion of the job, the standard output will be available in the
file with the default name "slurm-<jobid>.out" - for
instance, "slurm-9.out". This will be available in the present
working folder from where you launched the job. You can rename the output
file to whatever you want if you wish. (It may also contain any of the stderr
messages but I am not sure of that as I haven't checked that.)
More details about all the SLURM commands can be found here: http://slurm.schedmd.com/quickstart.html
Special note: Note that
with the batch system, there is no guarantee that it will use all the nodes
you specify. For instance, if you specify -N8 in your sbatch
command (and mpirun -np 8), there is no guarantee
that the batch system will actually launch the job on 8 different nodes, 1
process per node. In fact, if the batch system finds 8 cores free and
available on a single node, it is likely to go and assign just that. You can
check this by running the top command on the individual compute
nodes (mentioned in the squeue output). There is really no way to
force the batch system to use the network but if you communicate between
distant ranks it is likely to be across the network. For instance, if I want
to run a 2 process job that also uses the network, then I'd actually launch
the job on 8 processes and make rank 0 talk to rank 4, rank 1 talk to rank
5, and so on. Please deploy this strategy for just the first project though.
Postscript: The above mode
of launching a job through the job scheduler/workload manager is what is
referred to as the "batch mode". There is an alternative way to
launch jobs on the cluster, in an "interactive mode". This could be
useful for very quick debugs or if the batch queue system isn't working for
whatever reason. If at some point this becomes necessary, please contact me
and I can provide the necessary instructions. But for now I want all
students to use the batch mode so that all jobs are submitted and
can be monitored through the queue. It should be pretty quick.