Program #2
For this assignment, you will be implementing some of the graph algorithms
that were described during the lectures. The graph we are using is a graph
description of a web domain.
Graph Generator:
The graph generator is graph1.pl, which requires
the code in that should run without trouble on omega.
This is a Perl script that builds a graph representation of a specified
web domain. The graph1.pl program will write the graph to the "graph1" file.
Parameters to the program include a limitation on the number of URLs to be
examined (I strongly recommend using this limit, because the graphs can get
very big) and a domain. For example,
perl graph1.pl -m 10 http://cygnus.uta.edu
builds a graph corresponding to the cygnus.uta.edu domain for no more
than 10 distinct URLs. An extra file, url_file is built that you can ignore.
The representation for these graphs is a set of vertex descriptions and edge
descriptions. Each vertex is specified by
v id label
where "id" is a unique integer id, and "label" is a string node label.
Each directed edge is specified by
d from to label
where "from" and "to" are the source and destination vertices, and "label"
is a string edge label.
The graph1.pl program generates a graph where vertices are used to represent
each page in the domain (the vertex label is the page URL) as well as keywords
found in document pages (the word is the vertex label). Edges connect linked
pages and words with the corresponding document page.
Task 1:
Generate a minimum spanning tree of a specified web graph (in the format of
graph1 or graph2) using Kruskal's algorithm. The input should be a graph file
name and the output should be a description (a description of vertices and
edges, one per line, is fine) of the minimum spanning tree. For this task,
assign weights of 3 to "hyperlink" edges and weights of 1 to "word" edges.
Task 2:
Implement the Floyd-Warshall algorithm to find shortest paths between
all URL vertices in the graph.
Task 3:
Implement a simulation of a parallel Floyd-Warshall algorithm. For this
implementation, assume there are as many simulated processors as there are
elements in the matrix (number of processors = (number of vertices)^2).
To simulate a parallel implementation, you will need an outer loop in the
algorithm that performs the work of processor 1, then processor 2 (using
values of the data before processor 1 made any updates), then processor 3,
and so forth.
Output the run times of all three tasks, and output the "pseudo-speedup"
of the parallel algorithm with respect to the serial shortest paths algorithm
(you can assume the run time of the parallel algorithm is the actual run time
divided by the number of processors).
Your program score will be graded based on the following criteria:
* Accurate implementation of Task 1 (30 points)
* Accurate implementation of Task 2 (30 points)
* Accurate implementation of Task 3 (30 points)
* Quality of code, accuracy of output, and comments (10 points)
You may not borrow code written by a fellow student in this class or
provided online. That would be cheating. If you have questions about the
program, contact the TA by email or in person during office hours.
Turn in your program by emailing a single zip file containing the code,
directions for compiling and running, and sample output directly to the TA
by midnight on the due date. Your program can be written in any language,
but should be able to run on omega using the compile and execution
instructions you provide. Late submissions will not be accepted. Have fun!