Cpt S 411: Programming Projects and Related

Programming Projects and Related Materials

Experimental platforms:

The main experimental platform for this class is the Pleiades cluster.

The Pleiades cluster is an 8-node, 64-core cluster. Each compute node has 8 Intel Xeon cores and 4GB of shared RAM.

Pleiades cluster: Please follow these instructions while using the Pleiades cluster.

Recommended usage: Please use the Pleiades cluster as your first preference. If you want to use a different cluster that you already have access to, that is fine as long as: a) the cluster is comparable (if not larger) in its configuration compared to Pleiades; and b) you log in your project report the system configuration of that cluster (under experimental setup section).

Examples:

MPI Hello World helloworld.c
MPI Send Receive test send_recv_test.c
OpenMP: Simple for loop parallelization: loop.c
OpenMP: Sum of n numbers using p threads: sumcomp.c
OpenMP: Matrix vector parallelization using p threads: matrix_vector.c
OpenMP: Synchronization primitives (critical section, atomic, locks): sync.c

PS: To compile an OpenMP code use:
gcc -o {execname} -fopenmp {source file names}

COVER PAGE: PDF Word
(please include this with every project and homework submission)

Programming Projects:

Programming Project #1: Due September 6, 2018
Programming Project #2: Due September 25, 2018
Assignment type: Team of size up to 2 encouraged
The goal of this project is to empirically estimate the network parameters latency and bandwidth, and the network buffer size, for the network connecting the nodes of the compute cluster.
Note that these were described in the class in detail (refer to the lecture on September 13th).

You are expected to the Pleiades cluster for this project unless you have access to a comparable/bigger/better cluster.

To derive the estimates write a simple MPI send receive program involving only two processors (one sends and the other receives). Each MPI_Send should send a message of size m bytes to the other processor. By increasing the message size m from 1, 2, 4, 8, ... and so on, you are expected to plot two runtime curves, one for send and another for receive. The communication time is to be plotted on the Y-axis and message size (m) on the X-axis. For the message size, you may have to go on up to 1MB or so to observe a meaningful trend. Make sure you double m for each step.

From the curves derive the values for latency, bandwidth and network buffer size. To ensure that your estimates are as precise and reliable as they can be, be sure to take an average over multiple iterations (at least 10) before reporting.

Deliverables (zipped into one zip file - with your names on it):
Note, for those of you who worked in teams of size 2, both of you should submit, but only one of you should submit the full assignment along with the report and cover page stating who your other partner was, and the other person simply submits the cover page.
i) Source code with timing functions,
ii) Report in PDF that shows your tables and charts followed by your derivation for the network parameter estimates. Make sure add a justification/explanation of your results. Don't dump the raw data or results. Your results need to be presented in a professional manner.

(As an alternative to MPI_Send and MPI_Recv, you are also allowed to use MPI_Sendrecv for further testing purposes but for the main results time your MPI_Recv. Please look at the API for MPI routines for further help.)
Programming Project #3: Conway Game of Life: Due October 11, 2018
Programming Project #4: Parallel Random Number Generation: Due October 30, 2018
Total points: 20
Assignment type: Team of size up to 2 encouraged

In this project you will implement a parallel random number series generator, using the Linear Congruential Generating model we discussed in class. Here, the i^th random number, denoted by x_i, is given by:
        x_i = (a*x_i-1 + b) mod P,         where, a and b are some positive constants and P is a big constant (typically a large prime); all three parameters {a,b,P} are user-defined parameters.
Your goal is to implement an MPI program for generating a random series up to the n^th random number of the linear congruential series (i.e., we need all n numbers of the series, and not just the n^th number). We discussed an algorithm that uses parallel prefix in the class. You are expected to implement this algorithm. Refer to the Lecture notes website to look for more references if you need.
Operate under the assumption that n>>p. Your code should have your own explicit implementation the parallel prefix operation. Your code should also get parameter values {a,b,P} and the random seed to use (same as x₀), from the user.
All the logic in your code for doing parallel prefix should be written from scratch. Use of MPI_Scan is *not* allowed for doing parallel prefix. Write your own parallel prefix function. I have already provided a lot of tips in the class.
It should be easy to test your code by writing your own simple serial code and comparing your output. If your parallel implementation is right, its output should be identical to that of the serial output (for the same parameter setting).
Performance analysis:
a) Generate speedup charts (speedups calculated over your serial implementation), fixing n to a large number such as a million and varying the number of processors from 2 to 64.
b) Study total runtime as a function of n. Vary n from a small number such as 16 and keep doubling until it reaches a large value (e.g., 1 million).
Compile a brief report that presents and discusses your results/findings. Quality of the presentation style in your report is important.

Deliverables (zipped into one zip file - with your names on it):
Note, for those of you who worked in teams of size 2, both of you should submit, but only one of you should submit the full assignment along with the report and cover page stating who your other partner was, and the other person simply submits the cover page.
    i) Cover page
    ii) Source code,
    iii) Report in PDF
Name your zip folders as Project4_MemberNames.zip. No whitespace allowed in the folder or file names. Follow the naming convention strictly. Otherwise you stand to lose points.
Rough grading rubric: code (30%), testing (40%), reporting (30%)
Special note on submission deadline:
Assignment is due October 30, 11:59pm PDT and there will be a 24 hour grace period with 10% late penalty. I will *not* be accepting any late submissions beyond the 24 hour grace period (even if it is a minute late). So please don't wait till the last minute to submit and tell me the website crashed. You can always submit a copy before and update it through the deadline.
Programming Project #5: Pi Estimator (using OpenMP multithreading): Due November 13, 2018
Total points: 10
Assignment type: Individual
In this project you will implement an OpenMP multithreaded PI value estimator using the algorithm discussed in class. This algorithm essentially throws a dart n times into a unit square and computes the fraction of times that dart falls into the embedded unit circle. This fraction multiplied by 4 gives an estimate for PI.
Here is the generation approach that you need to implement: PDF

Your code should expect two arguments: <n> <number of threads>.
Your code should output the PI value estimated at the end. Note that the precision of the PI estimate could potentially vary with the number of threads (assuming n is fixed).
Your code should also output the total time (calculated using omp_get_wtime function).

Experiment for different values of n (starting from 1024 and going all the way up to a billion or more) and p (1,2,4..).

Please do two sets of experiments as instructed below:

1) For speedup - keeping n fixed and increase p (1,2,4, 8). You may have to do this for large values of n to observe meaningful speedup. Calculate relative speedup. Note that the Pleiades nodes have 8 cores per node. So there is no point in increasing the number of threads beyond 8. In your report show the run-time table for this testing and also the speedup chart.
PS: If you are using Pleiades (which is what I recommend) you should still use the Queue system (SLURM) to make sure you get exclusive access to a node.   For this you need to run "sbatch -N1" option (i.e., run the code on a single node).

2) For precision testing - keep n/p fixed, and increase p (1,2,.. up to 16 or 32). For this you will have to start with a good granularity (n/p) value which gave you some meaningful speedup from experiment set 1. The goal of this testing is to see if the PI value estimated by your code improves in precision with increase in n. Therefore, in your report make a table that shows the PI values estimated (up to say 20-odd decimal places) with each value of n tested.
Deliverables (zipped into one zip file - with your name on it):
    i) Cover page
    ii) Source code,
    iii) Report in PDF
Name your zip folders as Project5_YourName.zip. No whitespace allowed in the folder or file names. Follow the naming convention strictly. Otherwise you stand to lose points.
Rough grading rubric: code (40%), testing (20%), reporting (40%)
Special note on submission deadline:
Assignment is due Nov. 13, 11:59pm PDT and there will be a 24 hour grace period with 10% late penalty. I will *not* be accepting any late submissions beyond the 24 hour grace period (even if it is a minute late). So please don't wait till the last minute to submit and tell me the website crashed. You can always submit a copy before and update it through the deadline.