Pleiades Cluster Usage Instructions:
Basic system configuration:
Pleiades is a 64-core cluster. The cluster contains 8 nodes, each with 8 Intel Xeon E5410 cores @ 2.33GHz, and with 4 GB of shared RAM. The compute nodes are labelled n000[0-7]. The MPI implementation used is OpenMPI.
Logging in: ssh into pleiades.tricity.wsu.edu with your WSU login credentials. You will need to use your WSU net id and password. If it doesn't work, contact the cluster system admin (dsearch@wsu.edu) and make sure you copy me in the email.
Compiling an MPI program:
To compile your code, simply use "mpicc -o <executable_name> <program_name(s)>". For instance, "mpicc -o helloworld helloworld.c"
If your code uses C++, then you can use the mpiCC or mpic++ commands. One of them (or both) should work.
Running an MPI job:
The cluster uses the SLURM workload manager. You need to use this manager to launch jobs on the cluster. Using this manager ensures that each job is pushed onto the batch queue system. The manager will figure out what nodes and what cores to use. You just specify the number of nodes, number of processes, and the executable (along with any related arguments).As soon as you launch this job, the command will return with a job id number. For instance, "Submitted batch job 9" means the job id is 9. Each time you launch a job you will get a different job id number.
Other valid sbatch command options:
sbatch -N <number_of_nodes> -n
<number_of_processes>
or:
sbatch -N <number_of_nodes> -c 1 (one process per core, default is
one process per node)
or:
sbatch -n <number_of_processes> -c 1 (one process per core and
enough nodes run the requested number of processes)
If for some reason you want to cancel a job when the job is still
running, the command to use is:
scancel <jobid>
Special note:
The system will reserve -N<number> of nodes for your task and run -n<number> tasks over the set of reserved nodes. Tasks will be distributed among all of your reserved nodes equally unless you specify more advanced parameters. You can check this by running the top command on the individual compute nodes (mentioned in the squeue output).
Postscript:The most common mistake students do at the start of the course is to simply run the program on the head/login node. Running jobs directly on the login nodes is prohibited. You have to use the SLURM queue to submit your jobs as instructed above, and this pushes the job to the compute nodes which share the same home directory and shared file system as the login node - so all your compiled code and data files and their paths will be valid when the jobs execute there. You are allowed to only compile the code on the login node.