Data-intensive bio-computing

DC: SMALL: EFFICIENT ALGORITHMS FOR DATA-INTENSIVE BIO-COMPUTING

RESEARCH    PUBLICATIONS    PEOPLE    SOFTWARE    CONTACT

NSF Award# IIS 0916463

This project is funded by the National Science Foundation.
Project link on NSF website

RESEARCH SYNOPSIS

The field of bioinformatics and computational biology is experiencing a data revolution unlike any other scientific computing field. Experimental techniques to procure data have increased in throughput, improved in accuracy, and reduced in costs. The preponderance of data has limited the scalability of existing software tools. In a pursuit to understand the complexities and challenges that stem from designing algorithms for data-intensive biocomputing, this project is developing new approaches for two major problems in protein bioinformatics:
    i) identification of protein families and homology clusters; and
    ii) peptide identification from large-scale mass spectrometry data.
The former requires large-scale graph analysis and the latter requires large-scale database search. The project is investigating a multi-faceted approach which involves designing space-efficient algorithms for massively parallel machines, developing algorithmic heuristics for reducing the time to solution, evaluating the MapReduce paradigm as an alternate computing model, and deploying multicore architectures for fine-grain parallelism.

PUBLICATIONS (by topic)
*These material are presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder."

Parallel protein family detection, sequence analysis and graph clustering

I. Rytsareva, Q. Le, E. Conner, A. Kalyanaraman, J. Panchal. Evaluating socio-technical coordination in open-source communities: A cluster-based approach. Proc. ASME International Design Engineering Technical Conferences & Computers and Information in Engineering Conference (IDETC/CIE), Accepted, August 12-15, Chicago, IL, 2012.
PDF

I. Rytsareva, A. Kalyanaraman. An efficient MapReduce algorithm for parallelizing large-scale graph clustering, Proc. ParGraph - Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs, Held in conjunction with HiPC'11, Bengaluru, India, 2011.
PDF

T. Chapman, A. Kalyanaraman. An OpenMP algorithm and implementation for clustering biological graphs, Proc. IA3 - Workshop on Irregular Applications: Architectures & Algorithms, Held in conjunction with SC'11, 2011, pp. 3-10.
PDF

C. Wu, A. Kalyanaraman, W.R. Cannon. pGraph: Efficient parallel construction of large-scale protein sequence homology graphs, IEEE Transactions on Parallel Distributed Systems (TPDS), Preprint, 2012, doi 10.1109/TPDS.2012.19.
PDF

A.O.T. Lau, A. Kalyanaraman, I. Echaide, G.H. Palmer, R. Bock, M.J. Pedroni, M. Rameshkumar, M.B. Ferreira, T.I. Fletcher, T.F. McElwain. Attenuation of virulence in an Apicomplexan hemoparasite results in reduced genome diversity at the population level. BMC Genomics, 2011, 12:410, doi 10.1186/1471-2164-12-410.
PDF

T. Chapman, A. Kalyanaraman. Enabling large-scale metagenomic protein family identification on the NSF TeraGrid. Abstract and undergraduate student poster, TeraGrid 2011, Salt Lake City, UT, July 18-21, 2011.

A. Kalyanaraman, A. Algorithms for genome assembly. Encyclopedia of Parallel Computing, D. Padua (ed.), Springer Science+Business Media LLC, 2011, pp. 755-768. doi:10.1007/978-0-387-09766-4.
PDF

C. Wu, A. Kalyanaraman, W.R. Cannon. A scalable parallel algorithm for large-scale protein sequence homology detection. Proc. International Conference on Parallel Processing (ICPP), pp. 333-342, 2010, doi: 10.1109/ICPP.2010.41.
PDF

C. Wu, A. Kalyanaraman. An efficient parallel approach for identifying protein families in large-scale metagenomic data sets. Proc. ACM/IEEE conference on Supercomputing (SC|08), Austin, TX, November 15-21, pp. 1-10, 2008, ISBN 978-1-4244-2835-9, IEEE Press, Piscataway, NJ, USA.
PDF

Mass spectrometry based peptide identification

A. Kalyanaraman, W.R. Cannon, B. Latt, D.J. Baxter. MapReduce Implementation of a Hybrid Spectral Library-Database Search Method for Large-scale Peptide Identification, Bioinformatics, 2011, 27(21):3072-3073. doi:10.1093/bioinformatics/btr523.
PDF

G. Kulkarni, A. Kalyanaraman, W.R. Cannon, D. Baxter. A scalable parallel approach for peptide identification from large-scale mass spectrometry data. Proc. International Conference on Parallel Processing Workshops (ICPP-W), pp. 423-430, Vienna, Austria, September 22-25, 2009, DOI 10.1109/ICPPW.2009.41
PDF

A. Kalyanaraman. D. Baxter, W.R. Cannon. Using clouds for data-intensive computing in proteomics, Proc. Workshop on Using Clouds for Parallel Computations in Systems Biology, held in conjunction with SC'09, Portland, OR, November 16, 2009.
PDF

Multicore and hardware acceleration

T. Majumder, M. Borgens, P.P. Pande, A. Kalyanaraman. On-chip network-enabled multicore platforms targeting maximum likelihood phylogeny reconstruction, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2012, 31(7):1061-1073.
PDF

T. Majumder, S. Sarkar, P. Pande, A. Kalyanaraman. NoC-based hardware accelerator for breakpoint phylogeny. IEEE Transactions on Computers, 2012, 61(6):857-869.
PDF

T. Majumder, P. Pande, A. Kalyanaraman. Accelerating Maximum Likelihood based phylogenetic kernels using Network-on-chip. Proc. International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2011, pp. 17-24. http://doi.ieeecomputersociety.org/10.1109/SBAC-PAD.2011.17.
PDF

T. Majumder, S. Sarkar, P. Pande, A. Kalyanaraman. An optimized NoC architecture for accelerating TSP kernels in breakpoint median problem. Proc. IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2010, pp. 89-96.
PDF

S. Sarkar, T. Majumder, A. Kalyanaraman, P. Pande. Hardware accelerators for biocomputing: A survey. Proc. IEEE International Symposium on Circuits and Systems (ISCAS), 2010, pp. 3789-3792.
PDF

S. Sarkar, G. Kulkarni, P. Pande, A. Kalyanaraman. Network-on-chip hardware accelerators for biological sequence alignments. IEEE Transactions on Computers, 2010, 59(1):29-41.
PDF

PEOPLE

FACULTY

    Ananth Kalyanaraman, WSU (PI)

    Partha Pande, WSU (Co-PI)

    William Cannon, PNNL (Co-PI)

Other Collaborators

Jitesh Panchal, School of Mechanical and Materials Engineering, WSU

Sriram Krishnamurthy, PNNL

Audrey Lau, Dept. Veterinary Microbiology and Pathology, WSU

CURRENT PROJECT STUDENTS

                       Turbo Majumder, PhD
                       Inna Rytsareva, PhD
                       Hao Lu, MS
                       Daryl Deford, Undergraduate research
                       Joseph Taylor, Undergraduate research
                       Lydia Paradiso, Undergraduate research

ALUMNI

                       Changjun (Andy) Wu, PhD
                       Meenakshi Rameshkumar, MS
                       Souradip Sarkar, PhD
                       Gaurav Kulkarni, MS
                       Michael Borgens, undergraduate research
                       Emma Conner, undergraduate research
                       Timothy Chapman, undergraduate research



SOFTWARE
(downloads will be updated with ongoing development)

pGraph (download):    Parallel construction of large-scale protein sequence homology graphs   (Wu, Kalyanaraman and Cannon, ICPP 2010, TPDS'12 journal version)

pClust (download):    Parallel identification of dense protein clusters (Wu and Kalyanaraman, SC|08)

MR-MSPolygraph (download):    A MapReduce implementation of a hybrid spectral library-database search method for peptide identification (Kalyanaraman et al., Bioinformatics, 2011)

pClust-sm (download):    Parallel identification of dense protein clusters on shared memory multicore machines using OpenMP (Chapman and Kalyanaraman, IA³ 2011)

CONTACT

Ananth Kalyanaraman
Assistant Professor
School of Electrical Engineering and Computer Science
Washington State University
PO Box 642752
Pullman WA 99164-2752

EMAIL: ananth@eecs.wsu.edu
PHONE: (509) 335-6760
FAX: (509) 335-3818 (departmental)

CAMPUS ADDRESS: EME 237

	Ananth Kalyanaraman, WSU (PI)

	Partha Pande, WSU (Co-PI)

	William Cannon, PNNL (Co-PI)