Most sorting algorithms use insertion sort as base case sorter for small sets of elements. The goal of this thesis is investigate if this can be improved in practice.
For example, optimal sorting functions with the minimum number of comparisons needed are known for up to around 16 elements, but these may not be very fast in practice due to branch mispredictions. Alternatively, sorting networks have a fixed pattern of swap operations, which may be pipelined with conditional moves by processors if the elements are independent. Another approach would be to use explicit vectorization using SIMD instruction.
Goal is to design, implement in C or C++, and evaluate sorting algorithms for small sets of elements. The algorithms should be profiled on modern processors (Intel and ARM) and designed to utilize the hardware accelerations available on these platforms. The new sorters for small inputs should ultimately be integrated in the currently fastest parallel in-place sorting implementation  and other algorithms.
This master thesis builds on previous work on sequential (string) sorting and parallel (string) sorting on multi-core systems [1, 2], and on parallel memory bandwidth experiments .
• Interest in sorting algorithms and characteristics of modern CPUs
• Excellent programming skills in C++, interest in reading assembly code.
• Ability to think for yourself
 Michael Axtmann, Sascha Witt, Daniel Ferizovic, and Peter Sanders. “In-Place Parallel Super Scalar Samplesort (IPS 4 o)”. In: 25th European Symposium on Algorithms (ESA). Volume 87. LIPIcs. preprint arXiv:1705.02257, 9:1–9:14. ISBN : 978-3-95977-049-1. ISSN : 1868-8969. Schloss Dagstuhl, 2017.
 Timo Bingmann, Andreas Eberle, and Peter Sanders. “Engineering Parallel String Sorting”. In: Algorithmica 77.1 (2017). preprint arXiv:1403.2056, pages 235–286. ISSN : 0178-4617. Springer.
 Timo Bingmann. Parallel Memory Bandwidth Measurement and Benchmark. 2013. URL : http://panthema.net/2013/pmbw/.