Collaborative Research: Privacy-Sensitive Data Mining from
Multi-Party Distributed Data
Project Award Number
National Science Foundation Grant
IIS-0350533
Collaborative award (UMBC)
IIS-0329143
Principal Investigator
Krishnamoorthy
Sivakumar
School of Electrical Engineering and Computer Science
Washington State University
Pullman, WA 99164-2752
Voice: (509) 335-4969, Fax: (509) 335-3818, Email:
siva @ eecs. wsu. edu
http://www.eecs.wsu.edu/~siva
Co-PI
Hillol Kargupta
Department of Computer Science and Electrical Engineering
University of Maryland, Baltimore County
1000 Hilltop Circle, Baltimore, MD 21250
Voice: (410) 455-3972, Fax : (410) 455-3969, Email: hillol @ cs. umbc. edu
http://www.cs.umbc.edu/~hillol
Students
WSU: Da Meng, Jianjie Ma, Wang Qi, Taikun Cheng
UMBC: Kun Liu, Souptik Dutta.
Post Docs (UMBC): Ran Wolff, Chris
Giannella
Keyword
Privacy preserving
distributed data mining
Project Summary
A growing number of data mining applications
need to deal with data sources that are distributed, possibly proprietary, and
sensitive to privacy. Financial transactions, health-care records, and network
communication traffic are a few examples. Privacy is also becoming an
increasingly important issue in data mining applications for counter-terrorism
and homeland defense that may require creating profiles, constructing social
network models, detecting terrorist communications from distributed privacy
sensitive multi-party data. Combining such diverse data sets belonging to
different parties may violate the privacy laws. Therefore we need algorithms
that can mine the data in a distributed fashion while guaranteeing that the
privacy of the data is not compromised.
This has resulted in the development of several privacy-preserving data mining
techniques. Many of these techniques work using randomized techniques to
perturb the data and preserve the data privacy while still guaranteeing the
invariance of the underlying patterns. This research pointed out that the
popular naive randomization of the data (more specifically, additive random
matrices) may preserve little data privacy in many cases. It proposes a
framework to understand these data masking techniques using the theory of
random matrices to shows the problems of some existing privacy-preserving data
mining techniques and potential research directions for solving the problems.
Current research effort includes development of a systematic framework for
designing privacy-preserving data mining algorithms. A post randomization
based scheme has been proposed towards that end.
This proposal also suggests the development of a collection of randomized
techniques that are provably correct in supporting privacy-sensitive
applications. It proposes a randomized projection-based technique to compute
statistical aggregates from multi-party distributed data. It also suggests
research on a privacy-sensitive distributed Bayesian network learning
algorithm for similar applications. The proposed algorithms will be developed,
tested, and evaluated using a collection of testbeds
that PIs have been developing for the last several years.
This research will be a collaborative effort
between University of Maryland Baltimore County and Washington State
University. Professor Kargupta will take the lead on the random matrices and
random projection research. Professor Sivakumar
will take the lead on the privacy-sensitive Bayesian network-based distributed
data mining.
Publications and Products
- K. Das, K. Bhaduri, K. Liu, H. Kargupta. (2008). Distributed
Identification of Top-l Inner Product Elements and its Application
in a Peer-to-Peer Network. IEEE Transactions on Knowledge and Data
Engineering. Volume 19, Number 3.
- K. Das, H. Kargupta, K. Liu, (2008). A Game Theoretic Framework for
Practical Privacy Preserving Distributed Data Mining. (In communication)
- H. Kargupta, K. Das, and K. Liu. (2007). A Game Theoretic Approach
toward Multi-Party Privacy-Preserving Distributed Data Mining. In the 11th
European Conference on Principles and Practice of Knowledge Discovery in
Databases, Warsaw, Poland, September 2007.
- K. Liu, C. Giannella, H. Kargupta. (2007). On the Privacy of Euclidean
Distance Preserving Data Perturbation. (In communication).
- K. Liu, K Bhaduri, K. Das, P. Nguyen, H. Kargupta (2006). Client-side
Web Mining for Community Formation in Peer-to-Peer Environments. SIGKDD
Explorations. Volume 8, Issue 2, Pages 11-20.
- K. Liu, H. Kargupta, and J. Ryan. (2006) Random projection-based
multiplicative data perturbation for privacy preserving distributed data
mining. IEEE Transactions on Knowledge and Data Engineering (TKDE), volume
18, number 1, pp. 92-106, January 2006.
- K. Liu, C. Giannella, and H. Kargupta. (2006). An attacker's view of
distance preserving maps for privacy preserving data mining. In Proceedings
of the 10th European Conference on Principles and Practice of Knowledge
Discovery in Databases (PKDD'06), Berlin, Germany, September 2006.
- J. da Silva, C. Giannella, R. Bhargava, H. Kargupta, and M. Klusch.
(2005) Distributed Data Mining and Agents, Invited submission, Engineering
Applications of Artificial Intelligence Journal. volume 18, pp. 791--807.
- H. Kargupta, K. Liu, S.
Datta, J. Ryan, K. Sivakumar(2003) Link Analysis, Privacy Preservation, and
Random Perturbations. Proceedings of the KDD Workshop on Link
Analysis for Detecting Complex Behavior (LinkKDD’03), Washington D.C., July
2003. pdf
- H. Kargupta, K. Liu, S. Datta, J. Ryan, and K. Sivakumar. (2003).
Homeland security and privacy sensitive data mining from multi-party
distributed resources. In Proceedings of the 12th IEEE International
Conference on Fuzzy Systems, volume 2, pp. 1257-1260, St. Louis, MO, May
2003.
pdf
- H. Kargupta, H. Dutta, S. Datta , K. Sivakumar (2003) Privacy Preserving
Data Mining and Random Perturbation. Proceedings of the Workshop on Privacy
in the Electronic Society (WPES'03), Washington DC,October,2003.
pdf
- S. Datta, H. Kargupta and K. Sivakumar. (2003) .Homeland Defense,
Privacy-Sensitive Data Mining, and Random Value Distortion. Proceedings of
the SIAM Workshop on Data Mining for Counter Terrorism and Security
(SDM'03),San Francisco, CA, May, 2003. pdf
- D. Meng, K. Sivakumar, and H. Kargupta. (2004). Privacy Sensitive
Bayesian Network Parameter Learning. Proceedings of the Fourth IEEE
International Conference on Data Mining. Brighton, UK, pages 427--430. pdf
- H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. (2004). Random Data
Perturbation Techniques and Privacy Preserving Data Mining. Knowledge and
Information Systems Journal, volume 7, number 4, pages 387--414.
pdf
- H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar. (2003). On the Privacy
Preserving Properties of Random Data Perturbation Techniques. Proceedings of
the IEEE International Conference on Data Mining. Melbourne, Florida, USA,
pages 99--106. (Winner of 2003 IEEE Data Mining Conference Best Paper
Award.) pdf
- J. Ma and K. Sivakumar, A Framework of Privacy-Preserving Bayesian
Network Parameter Learning using Post Randomization, In WSEAS Transaction of
Information Technologies, January, 2006. pdf
- J. Ma and K. Sivakumar, Privacy-preserving Bayesian Network Learning
from Heterogeneously Distributed data, In the proceeding of the 2nd
International conference on Data Mining, Las Vegas, NV, (DMIN06), 2006.
pdf
- H. Kargupta and K. Sivakumar, (2004) Existential Pleasures of
Distributed Data Mining. Data Mining: Next Generation Challenges and Future
Directions. Editors: H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha. AAAI/MIT
Press.
- J. Ma, Learning from perturbed data for privacy-preserving data mining,
PhD Thesis, Washington State University, 2006.
pdf
- H. Kargupta, K. Liu , and J. Ryan. (2003). Privacy Sensitive Distributed
Data Mining from Multi-party Data. Proceedings of Intelligence and Security
Informatics, First NSF/NIJ Symposium 2003:336-342, Tucson, AZ. K. Liu, H.
Kargupta, and J. Ryan. (2003).
- Random projection and privacy preserving correlation computation from
distributed data. Technical Report TR-CS-03-24, Computer Science and
Electrical Engineering Department, University of Maryland, Baltimore County.
In communication.
Some of the above publications can be downloaded from this
link....