STXXL is a layered library; it consists of three layers (see Figure 1). The lowest layer, the Asynchronous I/O primitives layer (AIO layer) abstracts away the details of how asynchronous I/O is performed on a particular operating system. Other existing external memory algorithm libraries rely only on synchronous I/O APIs [11] or allow reading ahead sequences stored in a file using the POSIX asynchronous I/O API [22]. These libraries also rely on uncontrolled operating system I/O caching and buffering in order to overlap I/O and computation in some way. However, this approach has significant performance penalties for accesses without locality. Unfortunately, asynchronous I/O APIs are very different on different operating systems (e.g. POSIX AIO and Win32 Overlapped I/O). Therefore, we have introduced the AIO layer to make porting STXXL easy. Porting the whole library to a different platform (for example Windows) requires only reimplementing the AIO layer using native file access methods and/or native multithreading mechanisms. STXXL has already several implementations of the layer which use different file access methods under POSIX/UNIX systems.
The Block management layer (BM layer) provides a programming interface simulating the parallel disk model. The BM layer provides abstraction for a fundamental concept in the external memory algorithm design - block of elements. The block manager implements block allocation/deallocation allowing several block-to-disk assignment strategies: striping, randomized striping, randomized cycling, etc. The block management layer provides implementation of parallel disk buffered writing [21], optimal prefetching [21], and block caching. The implementations are fully asynchronous and designed to explicitly support overlapping between I/O and computation.
The top of STXXL consists of two modules.
The STL-user layer provides external memory sorting, external memory
stack, external memory priority queue, etc. which have
(almost) the same interfaces (including syntax and semantics) as their
STL counterparts.
The Streaming layer provides efficient support for
pipelining external memory algorithms.
Many external memory algorithms, implemented using this layer, can save
factor of 2-3 in I/Os.
For example, the algorithms for external
memory suffix
array construction implemented with this module [15]
require only of
I/Os which must be performed by implementations that use conventional
data structures and algorithms (either from STXXL STL-user layer, or LEDA-SM,
or TPIE).
The win is due to an efficient interface, that couples the
input
and the output of the algorithm-components (scans, sorts,
etc.). The
output from an algorithm is directly fed into another algorithm as
input, without needing to store it on the disk in between.
This generic pipelining interface is the first of this kind
for external memory algorithms.