Home | english  | Impressum | Datenschutz | Sitemap | KIT

[reserviert] Distributed String Sorting (with Thrill or MPI)

[reserviert] Distributed String Sorting (with Thrill or MPI)
Forschungsthema:Distributed Algorithms

Timo Bingmann

Links:Aushang PDF


The goal of this master thesis is to investigate string sorting on distributed systems. Sorting strings or vectors is a basic algorithmic challenge different from integer sorting because it is important to access components of the keys to avoid repeated operations on the entire string.
In a distributed scenario, each machine can first sort its local set of strings. But then the global order has to be determined by exchanging prefixes or hashes of prefixes of the remaining distinguishing characters. Multiple algorithms are conceivable to solve this problem.
For example, the complete remaining strings could be exchanged with compression of common prefixes. Alternatively, a prefix doubling approach with order-preserving hashing and distributed duplication detection [1] can be applied.
The newly designed algorithms should be analyzed theoretically if possible, and implemented and evaluated on a cluster system. Depending on the algorithm and setting, either Thrill [2] or MPI should be used as communication framework.
This master thesis builds on previous work on sequential string sorting and parallel string sorting on multi-core systems [3, 4].

• Interest in string algorithms and distributed big data processing
• Excellent programming skills in C++
• Ability to think for yourself

Info: Other distributed algorithm topics are available.

[1] Peter Sanders, Sebastian Schlag, and Ingo Müller. “Communication Efficient Algorithms for Fundamental Big Data Problems”. In: IEEE International Conference on Big Data, pages 15–23. ISBN : 978-1-4799-1292-6. IEEE, Oct. 2013.
[2] Timo Bingmann, Michael Axtmann, Emanuel Jöbstl, et al. “Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++”. In: IEEE International Conference on Big Data. preprint arXiv:1608.05634, pages 172–183. ISBN : 978-1-4673-9005-7. IEEE. Dec. 2016.
[3] Andreas Eberle. “Parallel Multiway LCP-Mergesort”. Bachelor Thesis. Karlsruhe Institute of Technology, Germany. Apr. 2014.
[4] Timo Bingmann, Andreas Eberle, and Peter Sanders. “Engineering Parallel String Sorting”. In: Algorithmica 77.1 (2017). preprint arXiv:1403.2056, pages 235–286. ISSN : 0178-4617. Springer.