Having a possibility to *update* (query) output with new input data rather than ...

jaggirs · on Jan 22, 2021

How is a rolling aggregate hard to update? If the value at index i is changed, just update everything from i-n to i+n (where n is the rolling window size).

asavinov · on Jan 22, 2021

Yes, this is the basic logic: for any incremental aggregation we need to detect groups which can be influenced by this new record or updated record. If we do row-based rolling aggregation then then indeed we need to update records (i-n, i+n). Yet, the following difficulties may arise:

o Generally, we do not want to re-compute aggregates - aggregates should be also updated, particularly, if n is very large

o In real applications, rolling aggregation is performed using partitioning on some objects. For example, we append new events from many different devices to one table and want to compute rolling aggregates for each individual device. Hence, this (i-n, i+n) will not work anymore.

o Rolling aggregation using absolute time windows will also work differently. Although, if records are ordered (like in stream processing) and there are no partitions, then it is easy.

scott_s · on Jan 25, 2021

Myself and a few others have done a lot of research on performing sliding window aggregations updates without recomputing everything. Our code is on github, and the README has links to the papers: https://github.com/IBM/sliding-window-aggregators