DC: SMALL: EFFICIENT ALGORITHMS FOR DATA-INTENSIVE BIO-COMPUTING
RESEARCH PUBLICATIONS PEOPLE SOFTWARE CONTACT
NSF Award# IIS 0916463
This project is funded by the National Science Foundation.
Project link on NSF website
The field of bioinformatics and computational biology is experiencing a data revolution unlike any other scientific computing field. Experimental techniques to procure data have increased in throughput, improved in accuracy, and reduced in costs. The preponderance of data has limited the scalability of existing software tools. In a pursuit to understand the complexities and challenges that stem from designing algorithms for data-intensive biocomputing, this project is developing new approaches for two major problems in protein bioinformatics:
i) identification of protein families and homology clusters; and
ii) peptide identification from large-scale mass spectrometry data.
The former requires large-scale graph analysis and the latter requires large-scale database search. The project is investigating a multi-faceted approach which involves designing space-efficient algorithms for massively parallel machines, developing algorithmic heuristics for reducing the time to solution, evaluating the MapReduce paradigm as an alternate computing model, and deploying multicore architectures for fine-grain parallelism.PUBLICATIONS (by topic)
*These material are presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder."Parallel protein family detection, sequence analysis and graph clustering
I. Rytsareva, Q. Le, E. Conner, A. Kalyanaraman, J. Panchal. Evaluating socio-technical coordination in open-source communities: A cluster-based approach. Proc. ASME International Design Engineering Technical Conferences & Computers and Information in Engineering Conference (IDETC/CIE), Accepted, August 12-15, Chicago, IL, 2012.
I. Rytsareva, A. Kalyanaraman. An efficient MapReduce algorithm for parallelizing large-scale graph clustering, Proc. ParGraph - Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs, Held in conjunction with HiPC'11, Bengaluru, India, 2011.
T. Chapman, A. Kalyanaraman. An OpenMP algorithm and implementation for clustering biological graphs, Proc. IA3 - Workshop on Irregular Applications: Architectures & Algorithms, Held in conjunction with SC'11, 2011, pp. 3-10.
C. Wu, A. Kalyanaraman, W.R. Cannon. pGraph: Efficient parallel construction of large-scale protein sequence homology graphs, IEEE Transactions on Parallel Distributed Systems (TPDS), Preprint, 2012, doi 10.1109/TPDS.2012.19.
A.O.T. Lau, A. Kalyanaraman, I. Echaide, G.H. Palmer, R. Bock, M.J. Pedroni, M. Rameshkumar, M.B. Ferreira, T.I. Fletcher, T.F. McElwain. Attenuation of virulence in an Apicomplexan hemoparasite results in reduced genome diversity at the population level. BMC Genomics, 2011, 12:410, doi 10.1186/1471-2164-12-410.
T. Chapman, A. Kalyanaraman. Enabling large-scale metagenomic protein family identification on the NSF TeraGrid. Abstract and undergraduate student poster, TeraGrid 2011, Salt Lake City, UT, July 18-21, 2011.
A. Kalyanaraman, A. Algorithms for genome assembly. Encyclopedia of Parallel Computing, D. Padua (ed.), Springer Science+Business Media LLC, 2011, pp. 755-768. doi:10.1007/978-0-387-09766-4.
C. Wu, A. Kalyanaraman, W.R. Cannon. A scalable parallel algorithm for large-scale protein sequence homology detection. Proc. International Conference on Parallel Processing (ICPP), pp. 333-342, 2010, doi: 10.1109/ICPP.2010.41.
C. Wu, A. Kalyanaraman. An efficient parallel approach for identifying protein families in large-scale metagenomic data sets. Proc. ACM/IEEE conference on Supercomputing (SC|08), Austin, TX, November 15-21, pp. 1-10, 2008, ISBN 978-1-4244-2835-9, IEEE Press, Piscataway, NJ, USA.
Mass spectrometry based peptide identification
A. Kalyanaraman, W.R. Cannon, B. Latt, D.J. Baxter. MapReduce Implementation of a Hybrid Spectral Library-Database Search Method for Large-scale Peptide Identification, Bioinformatics, 2011, 27(21):3072-3073. doi:10.1093/bioinformatics/btr523.
G. Kulkarni, A. Kalyanaraman, W.R. Cannon, D. Baxter. A scalable parallel approach for peptide identification from large-scale mass spectrometry data. Proc. International Conference on Parallel Processing Workshops (ICPP-W), pp. 423-430, Vienna, Austria, September 22-25, 2009, DOI 10.1109/ICPPW.2009.41
A. Kalyanaraman. D. Baxter, W.R. Cannon. Using clouds for data-intensive computing in proteomics, Proc. Workshop on Using Clouds for Parallel Computations in Systems Biology, held in conjunction with SC'09, Portland, OR, November 16, 2009.
Multicore and hardware acceleration
T. Majumder, M. Borgens, P.P. Pande, A. Kalyanaraman. On-chip network-enabled multicore platforms targeting maximum likelihood phylogeny reconstruction, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2012, 31(7):1061-1073.
T. Majumder, S. Sarkar, P. Pande, A. Kalyanaraman. NoC-based hardware accelerator for breakpoint phylogeny. IEEE Transactions on Computers, 2012, 61(6):857-869.
T. Majumder, P. Pande, A. Kalyanaraman. Accelerating Maximum Likelihood based phylogenetic kernels using Network-on-chip. Proc. International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2011, pp. 17-24. http://doi.ieeecomputersociety.org/10.1109/SBAC-PAD.2011.17.
T. Majumder, S. Sarkar, P. Pande, A. Kalyanaraman. An optimized NoC architecture for accelerating TSP kernels in breakpoint median problem. Proc. IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2010, pp. 89-96.
S. Sarkar, T. Majumder, A. Kalyanaraman, P. Pande. Hardware accelerators for biocomputing: A survey. Proc. IEEE International Symposium on Circuits and Systems (ISCAS), 2010, pp. 3789-3792.
S. Sarkar, G. Kulkarni, P. Pande, A. Kalyanaraman. Network-on-chip hardware accelerators for biological sequence alignments. IEEE Transactions on Computers, 2010, 59(1):29-41.
FACULTY
Ananth Kalyanaraman, WSU (PI) Partha Pande, WSU (Co-PI) William Cannon, PNNL (Co-PI) Other Collaborators
Jitesh Panchal, School of Mechanical and Materials Engineering, WSU
Sriram Krishnamurthy, PNNL
Audrey Lau, Dept. Veterinary Microbiology and Pathology, WSU
CURRENT PROJECT STUDENTS
Turbo Majumder, PhD
Inna Rytsareva, PhD
Hao Lu, MS
Daryl Deford, Undergraduate research
Joseph Taylor, Undergraduate research
Lydia Paradiso, Undergraduate research
ALUMNIChangjun (Andy) Wu, PhD
Meenakshi Rameshkumar, MS
Souradip Sarkar, PhD
Gaurav Kulkarni, MS
Michael Borgens, undergraduate research
Emma Conner, undergraduate research
Timothy Chapman, undergraduate research
SOFTWARE
(downloads will be updated with ongoing development)pGraph (download): Parallel construction of large-scale protein sequence homology graphs (Wu, Kalyanaraman and Cannon, ICPP 2010, TPDS'12 journal version)
pClust (download): Parallel identification of dense protein clusters (Wu and Kalyanaraman, SC|08)
MR-MSPolygraph (download): A MapReduce implementation of a hybrid spectral library-database search method for peptide identification (Kalyanaraman et al., Bioinformatics, 2011)
pClust-sm (download): Parallel identification of dense protein clusters on shared memory multicore machines using OpenMP (Chapman and Kalyanaraman, IA3 2011)
Ananth Kalyanaraman
Assistant Professor
School of Electrical Engineering and Computer Science
Washington State University
PO Box 642752
Pullman WA 99164-2752
EMAIL: ananth@eecs.wsu.edu
PHONE: (509) 335-6760
FAX: (509) 335-3818 (departmental)
CAMPUS ADDRESS: EME 237