Home | deutsch  | Legals | Data Protection | Sitemap | KIT

Communication Efficient Checking of Big Data Operations

Communication Efficient Checking of Big Data Operations
Author(s):

Lorenz Hübschle-Schneider and Peter Sanders

Links:
Source:

arXiv:1710.08255

Date: October 2017

We propose fast probabilistic algorithms with low (i.e., sublinear in the input size) communication volume to check the correctness of operations in Big Data processing frameworks and distributed databases. Our checkers cover many of the commonly used operations, including sum, average, median, and minimum aggregation, as well as sorting, union, merge, and zip. An experimental evaluation of our implementation in Thrill (Bingmann et al., 2016) confirms the low overhead and high failure detection rate predicted by theoretical analysis.