next up previous
Next: Algorithms Up: STXXL Design Previous: STL-user Layer

Containers

We go over the containers currently available in STXXL.

The most universal STXXL container is stxxl::vector. Vector is an array whose size can vary dynamically. The implementation of stxxl::vector is similar to LEDA-SM array [11]. The content of a vector is striped block-wise over the disks using an assignment strategy given as a template parameter. Some of the blocks are cached in a vector cache of fixed size (also a parameter). The replacement of cache blocks is controlled by a specified page-replacement strategy. STXXL has implementations of LRU and random replacement strategies. The user can provide his/her own strategy as well. STXXL vector has STL compatible Random Access Iterators. One random access costs $\mathcal{O}\!\left( 1\right)$ I/Os in the worst case. Sequential scanning of the vector costs $\mathcal{O}\!\left( 1/DB\right)$ amortized I/Os per vector element. In the paper we use the classical notation from [38], where $M$ is the size of main memory, $B$ is the block size in bytes, $D$ is the number of disks, and $N$ is the input size measured in bytes.

The I/O-efficient stxxl::stack is perhaps the simplest external memory data structure. Four different implementations of stack are available in STXXL. Some of the implementations are optimized to prefetch data ahead and to queue writing, efficiently overlapping I/O and computation. The amortized I/O complexity for push and pop stack operations is $\mathcal{O}\!\left( 1/DB\right)$.

External memory priority queues are the central data structures for many I/O-efficient graph algorithms [39,9,27]. The main technique in these algorithms is time-forward processing [9,3], easily realizable by an I/O efficient priority queue. I/O efficient priority queues also find their application in large-scale discrete event simulation and online sorting. The STXXL implementation of priority_queue is based on [31]. This queue needs less than a third of I/Os used by other similar cache (I/O) efficient priority queues (e.g. [7,18]). The implementation supports parallel disks and overlaps I/O and computation.

The current version of STXXL also has an implementation of external memory map (based on B$^+$-tree) and external memory FIFO queue.

As in other external memory algorithm libraries [11,22] STXXL has the restriction that the data types stored in the containers can not have C/C++ pointers or references to other elements of external memory containers. The reason is that these pointers and references get invalidated when the blocks containing the elements they point/refer, are written to disk. To get around this problem, the links can be kept in the form of external memory iterators (e.g. stxxl::vector::iterator). The iterators remain valid while storing to and loading from external memory. When dereferencing an external memory iterator, the pointed object is loaded from external memory by the library on demand (if the object is not already in the cache of the data structure).

STXXL containers differ in treating allocation and distinguishing between uninitialized and initialized memory from the STL containers. STXXL containers assume that the data types they store are plain old data types (POD). The constructors and destructors of the contained data types are not called when a container changes its size. The support of constructors and destructors would imply significant I/O cost penalty, e.g. on the deallocation of a non-empty container, one has to load all contained objects and call their destructors. This restriction sounds more severe than it is, since external memory data structures can not cope with custom dynamic memory management anyway, the common use of custom constructors/destructors. However, we plan to implement special versions of STXXL containers which will support not only PODs and handle construction/destruction appropriately.


next up previous
Next: Algorithms Up: STXXL Design Previous: STL-user Layer
Roman Dementiev 2005-08-09