PROGRAMMING ASSIGNMENT #1
Cpt S 471/571, Spring 2021
Due: February 19, 2021 (Friday), by 11:59pm Pacific Time @Blackboard (WSU) Assignments dropbox
General Guidelines:
Assignment:
Implement two algorithms:
i) Needleman-Wunsch algorithm for computing OPTIMAL GLOBAL ALIGNMENT, and
ii) Smith-Waterman algorithm for computing OPTIMAL LOCAL ALIGNMENT,
both using affine gap penalty function between two input DNA sequences, s1 and s2, of lengths m and n respectively.
Each cell of your Dynamic Programming table ("DP table") should have the following structure:
struct DP_cell {
int Sscore;
// field for Substitution (S) score
int
Dscore; // field for Deletion
(D) score
int Iscore;
// field for Insertion (I) score
... // add any other field(s)
that you may need for the implementation
}
At the start of the program, you should read the alignment score parameters from a user-specified input file (optional). The default name of the file, if the user does not specify one, should be "parameters.config" in the present working directory. The parameters.config file should allow the user to specify one scoring parameter in each line (space or tab delimited). For example:
match 1
mismatch -2
h -5
g -1
The command prompt usage for your program should look as follows:
$ <executable name> <input file containing both s1 and s2> <0: global, 1: local> <optional: path to parameters config file>
Input File Formats:
The two input sequences should be given as input in one text file. The text file should be in what is called the "multi-sequence FASTA format", which is as follows:
For example:
An input file called "input.fasta" could look like:
>s1 sequence
acatgctacacgtactccgataccccgtaaccgataacgatacacagacct
cgtacgcttgctacaacgtactctataaccgagaacgattgaca
tgcctcgtacacatgctacacgtactccgatgaccccgt
>s2 sequence
acattctacgaacctctcgataaccccataaccgataacgattgacacctcgt
acgctttctacaacttactctctcgataaccccataaccgataacgattgacacctc
gtacacatggtacatacgtactctcgataccccgt
At completion, the program should output/print the following information:
All output should be to the standard output.
As shown above, the alignment is wrapped after every 60 alignment positions, and on both sides the starting and ending indices of the aligning positions in the respective strings should be displayed. Then, a pipe symbol "|" is shown in columns wherever there is a match. (Other columns contain a whitespace character).
After implementing and testing, run your program on the following two input files, redirect the standard output into a global alignment and a local alignment output text files and attach them as part of your submission:
Opsin1_colorblindness_gene.fasta:
Contains
two sequences - Opsin1 gene in human vs. Opsin1 gene in mouse (this is one
of the genes responsible for colorblindness)
Human-Mouse-BRCA2-cds.fasta:
Contains two sequences - BRCA2 gene in human vs. BRCA2 gene in mouse (this
is one of the breast cancer genes)
CHECKLIST FOR SUBMISSION:
___ source code
___ output files for
the above two real world inputs
___ All the above
zipped as a folder into an archive .zip