We revisit tree compression with top trees (Bille et al, ICALP'13) and present several improvements to the compressor and its analysis. By significantly reducing the amount of information stored and guiding the compression step using a RePair-inspired heuristic, we obtain a fast compressor achieving good compression ratios, addressing an open problem posed by Bille et al. We show how, with relatively small overhead, the compressed file can be converted into an in-memory representation that supports basic navigation operations in worst-case logarithmic time without decompression. We also show a much improved worst-case bound on the size of the output of top-tree compression (answering an open question posed in a talk on this algorithm by Weimann in 2012).
Tree Compression with Top Trees Revisited
Lorenz Hübschle-Schneider und Rajeev Raman
Technical Report Juni 2015, arXiv:1506.04499