Massive data sets arise naturally in many domains: geographic information systems, computer graphics, database systems, telecommunication billing systems , network analysis , and scientific computing . Applications working in those domains have to process terabytes of data. However, the internal memories of computers can keep only a small fraction of these huge data sets. During the processing the applications need to access the external storage (e.g. hard disks). One such access can be about times slower than a main memory access. For any such access to the hard disk, accesses to the next elements in the external memory are much cheaper. In order to amortize the high cost of a random access one can read or write contiguous chunks of size . The I/O becomes the main bottleneck for the applications dealing with the large data sets, therefore one tries to minimize the number of I/O operations performed. In order to increase I/O bandwidth, applications use multiple disks, in parallel. In each I/O step the algorithms try to transfer blocks between the main memory and disks (one block from each disk). This model has been formalized by Vitter and Shriver as Parallel Disk Model (PDM)  and is the standard theoretical model for designing and analyzing I/O-efficient algorithms.
Theoretically, I/O-efficient algorithms and data structures have been developed for many problem domains: graph algorithms, string processing, computational geometry, etc. (for a survey see ). Some of them have been implemented: sorting, matrix multiplication , search trees [8,30,4,1], priority queues , text processing . However there is an ever increasing gap between theoretical achievements of external memory algorithms and their practical usage. Several external memory software library projects (LEDA-SM  and TPIE ) have been started to reduce this gap. They offer frameworks which aim to speed up the process of implementing I/O-efficient algorithms, abstracting away the details of how I/O is performed.
TPIE and LEDA-SM projects are excellent proofs of concept of external memory paradigm, but have some drawbacks which impede their practical usage. This led us to start the development of a performance-oriented library of external memory algorithms and data structures STXXL, which tries to avoid those obstacles. The following are some key features of STXXL:
STXXL library is open source and available under the Boost Software License 1.0 (http://www.boost.org/LICENSE_1_0.txt). The latest version of the library, a user tutorial and a programmer documentation can be downloaded at http://stxxl.sourceforge.net. Currently the size of the library is about 15 000 lines of code.
The remaining part of this paper is organized as follows. Section 2 discusses the design of STXXL. We explain how the generic interfaces of the library support high performance external memory computations using parallel disks, overlapping between I/O and computation, and pipelining. The section provides several examples. In Section 3 we implement a short benchmark that computes a maximal independent set of a graph and use it to study the performance of STXXL. Section 4 gives a short overview of the projects using STXXL. We make some concluding remarks and point out the directions of future work in Section 5.
The shortened version of this paper is also published as .