Programming Projects and Related Materials
Experimental platforms:
The main experimental platform for this class is the Pleiades cluster.
The Pleiades cluster is an 8-node, 64-core cluster. Each compute
node has 8 Intel Xeon cores and 4GB of shared RAM.
Pleiades cluster: Please
follow these instructions while
using the Pleiades cluster.
Recommended usage: Please use the Pleiades cluster as your first
preference. If you want to use a different cluster that you already have
access to, that is fine as long as: a) the cluster is comparable (if not
larger) in its configuration compared to Pleiades; and b) you log in your
project report the system configuration of that cluster (under
experimental setup section).
Examples:
PS: To compile an OpenMP
code use:
gcc
-o {execname} -fopenmp {source file names}
COVER PAGE: PDF
Word
(please include this with every project and homework submission)
Programming Projects:
- Programming Project #1: Due
September 6, 2018
- Programming Project #2: Due September 25, 2018
Assignment type: Team of size up to 2 encouraged
The goal of this project is to empirically estimate the network
parameters latency and bandwidth,
and the network buffer size, for the network
connecting the nodes of the compute cluster.
Note that these were described in the class in detail (refer to the
lecture on September 13th).
You are expected to the Pleiades cluster for this project unless you
have access to a comparable/bigger/better cluster.
To derive the estimates write a simple MPI send receive program
involving only two processors (one sends and the other receives). Each
MPI_Send should send a message of size m bytes to the other
processor. By increasing the message size m from 1, 2, 4, 8, ...
and so on, you are expected to plot two runtime curves, one for send and
another for receive. The communication time is to be plotted on the
Y-axis and message size (m) on the X-axis. For the message size, you may
have to go on up to 1MB or so to observe a meaningful trend. Make sure
you double m for each step.
From the curves derive the values for latency, bandwidth and network
buffer size. To ensure that your estimates are as precise and reliable
as they can be, be sure to take an average over multiple iterations (at
least 10) before reporting.
Deliverables (zipped into one zip file - with your names on it):
Note, for those of you who worked in teams
of size 2, both of you should submit, but only one of you should
submit the full assignment along with the report and cover page
stating who your other partner was, and the other person simply
submits the cover page.
i) Source code with timing
functions,
ii) Report in PDF that
shows your tables and charts followed by your derivation for the network
parameter estimates. Make sure add a justification/explanation of your
results. Don't dump the raw data or results. Your results need to be
presented in a professional manner.
(As an alternative to MPI_Send and MPI_Recv, you are also allowed to use
MPI_Sendrecv for further testing purposes but for the main results time
your MPI_Recv. Please look at the API for MPI routines for further
help.)
- Programming Project #3: Conway Game
of Life: Due October 11, 2018
- Programming Project #4: Parallel Random Number Generation:
Due October 30, 2018
Total points: 20
Assignment type: Team of size up to 2 encouraged
In
this project you will implement a parallel random
number series generator,
using the Linear Congruential Generating model we discussed in class.
Here, the ith random number, denoted by xi,
is given by:
xi =
(a*xi-1 +
b) mod P, where, a and
b are some positive constants and P is a big constant (typically a
large prime); all three parameters {a,b,P} are user-defined
parameters.
Your
goal is to implement an MPI program for generating a random series up
to the nth random
number of the linear congruential series (i.e., we need all n
numbers of the series, and not just the nth number).
We discussed an algorithm that uses parallel prefix in the class. You
are expected to implement this algorithm. Refer to the Lecture notes
website to look for more references if you need.
Operate under the assumption that n>>p. Your code should have
your own explicit implementation the parallel prefix operation. Your
code should also get parameter values {a,b,P} and the random seed to
use (same as x0),
from the user.
All
the logic in your code for doing parallel prefix should be written
from scratch. Use of MPI_Scan is *not* allowed for doing parallel
prefix. Write your own parallel prefix function. I have already
provided a lot of tips in the class.
It should be easy to test your code by writing your own simple serial
code and comparing your output. If your parallel implementation
is right, its output should be identical to that of the serial
output (for the same parameter setting).
Performance
analysis:
a)
Generate speedup charts (speedups calculated over your serial
implementation), fixing n to a large number such as a million and
varying the number of processors from 2 to 64.
b)
Study total runtime as a function of n. Vary n from a small number
such as 16 and keep doubling until it reaches a large value (e.g., 1
million).
Compile
a brief report that presents and discusses your results/findings.
Quality of the presentation style in your report is important.
Deliverables
(zipped into one zip file - with your names on it):
Note,
for those of you who worked in teams of size 2, both of you should
submit, but only one of you should submit the full assignment along
with the report and cover page stating who your other partner was,
and the other person simply submits the cover page.
i) Cover page
ii) Source code,
iii) Report in PDF
Name your zip folders as Project4_MemberNames.zip. No whitespace
allowed in the folder or file names. Follow the naming convention
strictly. Otherwise you stand to lose points.
Rough grading rubric: code (30%), testing (40%), reporting
(30%)
Special note on submission deadline:
Assignment
is due October 30, 11:59pm PDT and there will be a 24 hour grace
period with 10% late penalty. I
will *not* be accepting any late submissions beyond the 24 hour grace
period (even if it is a minute late). So please don't wait till the
last minute to submit and tell me the website crashed. You can always
submit a copy before and update it through the deadline.
-
- Programming Project #5: Pi Estimator (using OpenMP
multithreading): Due November 13,
2018
Total points: 10
Assignment type: Individual
In this project you will implement an OpenMP multithreaded PI value
estimator using the algorithm discussed in class. This algorithm
essentially throws a dart n times into a unit square and
computes the fraction of times that dart falls into the embedded unit
circle. This fraction multiplied by 4 gives an estimate for PI.
Here is the generation approach that you need to implement: PDF
Your code should expect two arguments: <n> <number of
threads>.
Your code should output the PI value estimated at the end. Note
that the precision of the PI estimate could potentially vary with the
number of threads (assuming n is fixed).
Your code should also output the total time (calculated using
omp_get_wtime function).
Experiment for different values of n (starting from 1024 and going all
the way up to a billion or more) and p (1,2,4..).
Please do two sets of experiments as instructed below:
1)
For speedup - keeping n fixed and increase p (1,2,4, 8). You may have
to do this for large values of n to observe meaningful speedup.
Calculate relative speedup. Note that the Pleiades nodes
have 8 cores per node. So there is no point in increasing the number of
threads beyond 8. In your report show the run-time table for this
testing and also the speedup chart.
PS: If you are using Pleiades (which is what I recommend) you should
still use the Queue system (SLURM) to make sure you get exclusive access
to a node. For this you need to run "sbatch -N1" option
(i.e., run the code on a single node).
2)
For precision testing - keep n/p fixed, and increase p (1,2,.. up to
16 or 32). For this you will have to start with a good granularity
(n/p) value which gave you some meaningful speedup from experiment set
1. The goal of this
testing is to see if the PI value estimated by your code improves in
precision with increase in n. Therefore, in your report make a
table that shows the PI values estimated (up to say 20-odd decimal
places) with each value of n tested.
Deliverables (zipped into one zip file - with your name on it):
i) Cover page
ii) Source code,
iii) Report in PDF
Name your zip folders as Project5_YourName.zip. No whitespace allowed
in the folder or file names. Follow the naming convention strictly.
Otherwise you stand to lose points.
Rough grading rubric: code (40%), testing (20%), reporting
(40%)
Special note on submission deadline:
Assignment
is due Nov. 13, 11:59pm PDT and there will be a 24 hour grace period
with 10% late penalty. I
will *not* be accepting any late submissions beyond the 24 hour grace
period (even if it is a minute late). So please don't wait till the
last minute to submit and tell me the website crashed. You can always
submit a copy before and update it through the deadline.