Fast Minimal Perfect Hash Functions

A Perfect Hash Function is a mapping from n keys each to a number in {1,...,m}, such that there are no collisions, i.e., each key maps to a different number. The perfect hash function is constructed once (the set of keys is static) and used for queries. Parameters to optimize are memory consumption and "fill degree", i.e., n/m. Current methods need about 1.5 to 2.5 bits per key in practice and there is a theoretical lower bound of 1.44 bits per key. Common values for the fill degree are n/m=0.98 or n/m=0.81. A perfect hash function that achieves a fill degree of n/m=1 (n=m) is called Minimal Perfect Hash Function.

Applications of Perfect Hash Functions are manifold. For example, a space-efficient static key-value store can be implemented by interpreting the hash value as a memory address. By storing a signature of objects, an AMQ filter can be developed that can answer membership queries ("is x in set S") with an adjustable error rate (see also: Bloom Filter). Often, one would like to have small, unique representatives of complex objects in order to handle them more efficiently - Perfect Hash Functions can provide that mapping. In the Google PageRank algorithm, a Perfect Hash Function can handle the mapping from URLs to node indices.

We recently developed a very space efficient Minimal Perfect Hash Function, but it has slow queries. In this HiWi position, we want to accelerate another Minimal Perfect Hash Function that has its focus on fast queries.

Task

Design, develop, and benchmark a parallel construction algorithm for Perfect Hash Functions. For additional speedups, use SIMD instructions (single instruction, multiple data) where possible.

Requirements

- Interest in space efficient data structures
- Knowledge of C++ and parallel algorithms
- Interested in low-level programming with SIMD instructions

References

- "Hash, displace, and compress" (Belazzougui, Botelho, Dietzfelbinger)
- "Retrieval and Perfect Hashing using Fingerprinting" (Müller, Sanders, Schulze, Zhou)

 

Note: This position is also available as a bachelor's thesis.