next up previous
Next: Conclusions Up: STXXL : Standard Template Previous: Performance


Applications

STXXL has been successfully applied in implementation projects that studied various I/O-efficient algorithms from the practical point of view. The fast algorithmic components of STXXL library gave the implementations an opportunity to solve problems of very large size on a low-cost hardware in a record time.

The performance of external memory suffix array construction algorithms was investigated in [15]. The experimentation with pipelined STXXL implementations of the algorithms has shown that computing suffix arrays in external memory is feasible even on a low-cost machine. Suffix arrays for long strings up to 4 billion characters could be computed in hours.

The project [2] has compared experimentally two external memory breadth-first search (BFS) algorithms [29,24]. The pipelining technique of STXXL has helped to save a factor of $2$-$3$ in I/O volume of the BFS implementations. Using STXXL, it became possible to compute BFS decomposition of node-set of large grid graphs with 128 million edges in less than a day, and for random sparse graph class within an hour.

Simple algorithms for computing minimum spanning trees (MST), connected components, and spanning forests were developed in [17,32]. Their implementations were built using STL-user-level algorithms and data structures of STXXL. The largest solved MST problem had $2^{32}$ nodes, the input graph edges occupied 96 GBytes. The computation on a PC took only 8h 40min.

The number of triangles in a graph is a very important metric in social network analysis [19]. We have designed and implemented an external memory algorithm that counts and lists all triangles in a graph. Using our implementation we have counted the number of triangles of a web crawl graph from the WebBase project 9. In this graph the nodes are web pages and edges are hyperlinks between them. For the computation we ignored the direction of the links. Our crawl graph had $135$ million nodes and $1.2$ billion edges. During computation on an Opteron SMP which took only 4h 46min we have detected $10.6$ billion triangles. Total volume of $851$ GB was transferred between 1GB of main memory and seven hard disks. The details about the algorithm and the source code are available under http://i10www.ira.uka.de/dementiev/tria/algorithm.shtml.


next up previous
Next: Conclusions Up: STXXL : Standard Template Previous: Performance
Roman Dementiev 2005-08-09