Online Filtering, Smoothing and Probabilistic Modeling of Streaming data

TitleOnline Filtering, Smoothing and Probabilistic Modeling of Streaming data
Publication TypeConference Papers
Year of Publication2008
AuthorsKanagal B, Deshpande A
Conference NameIEEE 24th International Conference on Data Engineering, 2008. ICDE 2008
Date Published2008/04/07/12
ISBN Number978-1-4244-1836-7
KeywordsData analysis, data streaming, declarative query, dynamic probabilistic model, Filtering, Global Positioning System, hidden Markov models, Monitoring, Monte Carlo methods, Noise generators, Noise measurement, online filtering, particle filter, particle filtering (numerical methods), probabilistic database view, probability, Real time systems, real-time application, relational database system, Relational databases, sequential Monte Carlo algorithm, Smoothing methods, SQL

In this paper, we address the problem of extending a relational database system to facilitate efficient real-time application of dynamic probabilistic models to streaming data. We use the recently proposed abstraction of model-based views for this purpose, by allowing users to declaratively specify the model to be applied, and by presenting the output of the models to the user as a probabilistic database view. We support declarative querying over such views using an extended version of SQL that allows for querying probabilistic data. Underneath we use particle filters, a class of sequential Monte Carlo algorithms, to represent the present and historical states of the model as sets of weighted samples (particles) that are kept up-to-date as new data arrives. We develop novel techniques to convert the queries on the model-based view directly into queries over particle tables, enabling highly efficient query processing. Finally, we present experimental evaluation of our prototype implementation over several synthetic and real datasets, that demonstrates the feasibility of online modeling of streaming data using our system and establishes the advantages of tight integration between dynamic probabilistic models and databases.