Dept. of Computer Science and Engineering
Oregon Health & Science University
As most current query processing architectures are already pipelined, it seems logical to apply them to data streams. However, two classes of query operators are impractical for processing long or unbounded data streams. Unbounded stateful operators maintain state with no upper bound on its size, and so eventually run out of memory. Blocking operators read the entire input before emitting a single output, and so might never produce a result. We believe that a priori semantic knowledge of a data stream can permit the use of such operators in some cases. We explore a kind of stream semantics called punctuated streams. Punctuations in a stream mark the end of substreams, allowing us to view a non-terminating stream as a mixture of terminating streams. We introduce three kinds of invariants to specify the proper behavior of query operators in the presence of punctuation. Pass invariants unblock blocking operators by defining when such an operator can pass results on. Keep invariants define what must be kept in local state to continue successful operation. Propagation invariants define when an operator can pass punctuation on. We then present a strategy for proving that implementations of these invariants are faithful to their finite table counterparts. In practice, it is important to answer the following question: "How much additional overhead is required when using punctuations?" We use the scenario of a monitoring system for an online auction. Streams of bids, new items, and new users are sent to an online auction system. There are many interesting queries that can be posed over these auction streams. We define queries for this scenario, and execute them with different kinds and amounts of punctuations embedded in the input streams. We show that, for a reasonable ratio of punctuations to data items, the overhead is minimal. Additionally, we compare the behavior of a query using punctuations with the behavior of the same query using slack over data streams with disorder. Clearly, not all punctuations are useful to a particular query, and it would be useful to make a determination of when they are. That is, we would like to answer the question "Can stream query Q benefit from a particular set of punctuations?" To that end, we first define punctuation schemes to specify the collection of punctuations that will be presented to a query on a particular data stream. We show how both punctuations and query operators induce groupings over the items in the domain of the input(s). We show that a query benefits from an input punctuation scheme (in terms of being able to produce a given output scheme), if each set in the groupings induced by the operators of the query is covered by a finite number of punctuations in the scheme -- a kind of compactness. We conclude with discussion on possible future directions of research related to punctuations and data streams. These directions focus on a variety of questions, ranging from issues in query optimization to other possible semantics that can be expressed using punctuations.
OGI School of Science and Engineering
Tucker, Peter, "Punctuated data streams" (2005). Scholar Archive. 149.