In today's online, connected world of information, data streams are generated in a variety of domains, including news feeds, social media, and network traffic. Representing these data streams as graphs has proven effective for uncovering patterns in their respective domains. However, graph-based approaches to knowledge discovery are computationally challenging, particularly when analyzing large amounts of data, or data that is represented as a stream. In addition, a better understanding of patterns associated with a person, place, or activity, cannot be realized through a single graph stream. For instance, in social media, one can discover interesting patterns of behavior about an individual through one account, but better insight into their overall behavior is realized by examining their activity on multiple accounts simultaneously. Thus, while graphs make for a logical choice of representing such data, there are many challenges associated with graph mining methods in regards to designing scalable algorithms that can operate in real-time on multiple, heterogeneous graph streams. In this project we investigate a new approach capable of scalable knowledge discovery in multiple graph streams, or what we call multi-stream graph mining. The approach is novel and untested, but potentially transformative in how we discover knowledge from multiple data streams. Specifically, we investigate novel methods to multi-stream graph mining in which different amounts and types of individual-stream pre-processing are used to reduce the size and arrival-rate of the data. These methods range in complexity from sampling the data stream, to compressing the data stream based on known patterns, to actually mining the individual streams and only passing the results on to the next phase of fusing the streams. We evaluate these methods using both artificial and real-world multi-stream graph data.
This material is based upon work supported by the National Science Foundation under Grant No. IIS-1646640.