STXXL has been successfully applied in implementation projects that studied various I/O-efficient algorithms from the practical point of view. The fast algorithmic components of STXXL library gave the implementations an opportunity to solve problems of very large size on a low-cost hardware in a record time.
The performance of external memory suffix array construction algorithms was investigated in [15]. The experimentation with pipelined STXXL implementations of the algorithms has shown that computing suffix arrays in external memory is feasible even on a low-cost machine. Suffix arrays for long strings up to 4 billion characters could be computed in hours.
The project [2] has compared experimentally two
external memory breadth-first search (BFS) algorithms [29,24].
The pipelining technique of STXXL has helped to save a factor of -
in I/O volume of the BFS implementations. Using STXXL, it
became possible to compute BFS decomposition of node-set of large grid
graphs with 128 million edges in less than a day, and for
random sparse graph class within an hour.
Simple algorithms for computing minimum spanning trees (MST), connected
components, and spanning forests were developed in [17,32]. Their implementations
were built using STL-user-level algorithms and
data structures of STXXL. The largest solved MST problem
had nodes, the input graph edges occupied 96 GBytes.
The computation on a PC took only 8h 40min.
The number of triangles in a graph is a very important metric
in social network analysis [19].
We have designed and implemented an external
memory algorithm that counts and lists all
triangles in a graph.
Using our implementation we have counted the number of
triangles of a web crawl graph from the WebBase project 9.
In this graph the nodes are web pages and edges are hyperlinks
between them.
For the computation we ignored the direction
of the links.
Our crawl graph had million nodes and
billion edges.
During computation on an Opteron SMP which took only
4h 46min we have detected
billion triangles.
Total volume of
GB was transferred between 1GB of main
memory and seven hard disks.
The details
about the algorithm and the source code
are available under
http://i10www.ira.uka.de/dementiev/tria/algorithm.shtml.