GitPedia

Webgraph

WebGraph is a framework for graph compression.

From vigna·Updated June 23, 2026·View on GitHub·

WebGraph is a framework for graph compression aimed at studying web graphs. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of: The project is written primarily in Java, distributed under the GNU Lesser General Public License v2.1 license, first published in 2021. Key topics include: centrality-measures, compression-library, graph, graph-compression, java.

Welcome to WebGraph!

Maven Central
javadoc

Introduction

WebGraph is a framework for graph compression aimed at studying web
graphs. It provides simple ways to manage very large graphs, exploiting
modern compression techniques. More precisely, it is currently made of:

  1. A set of flat codes, called ζ codes, which are particularly
    suitable for storing web graphs (or, in general, integers with power-law
    distribution in a certain exponent range). The fact that these codes work
    well can be easily tested empirically, but we also try to provide a
    detailed mathematical
    analysis
    .

  2. Algorithms for compressing web graphs that exploit gap compression and
    referentiation (à la LINK),
    intervalization and ζ codes to provide a high compression ratio (see our
    datasets
    ). The algorithms are
    controlled by several parameters, which provide different tradeoffs
    between access speed and compression ratio.

  3. Algorithms for accessing a compressed graph without actually
    decompressing it, using lazy techniques that delay the decompression until
    it is actually necessary.

  4. Algorithms for analysing very large graphs, such as
    HyperBall
    which has been used to show that Facebook has just four degrees of
    separation
    .

  5. An implementation of the algorithms above in Java and
    Rust distributed under either the GNU
    Lesser General Public License
    2.1+
    or the
    Apache Software License
    2.0
    . Besides a clearly
    defined API, we also provide several classes tha modify (e.g., transpose)
    or recompress a graph, so to experiment with various settings.

  6. Datasets for large graph. These are either
    gathered from public sources (such as
    WebBase), or
    produced by UbiCrawler and BUbiNG.

In the end, with WebGraph you can access and analyse very large web
graphs. Using WebGraph is as easy as installing a few jar files and
downloading a dataset. This makes studying phenomena such as PageRank,
distribution of graph properties of the web graph, etc. very easy.

You are welcome to use and improve WebGraph! If you find our software
useful for your research, please quote this
paper
.

This version of WebGraph is limited to graphs with at most 2³¹ nodes. For
larger graphs, have a look at the big
version
.

Users of WebGraph

<a href="https://www.softwareheritage.org/"><img src="svg/SWH.svg" width="200"></a>
        
<a href="https://www.commoncrawl.org/"><img src="svg/CC.svg" width="200"></a>

JGraphT

<a href="https://jgrapht.org/">JGraphT</a> has a
few <a href="https://jgrapht.org/guide/WebGraphAdapters">WebGraph adapters</a>.

Python

There are Python bindings for WebGraph.

Papers

  • A detailed description of
    the compression algorithms used in WebGraph, published in the proceedings
    of the Thirteenth International World–Wide Web
    Conference
    .

  • A mathematical analysis
    of the performance of γ, δ and ζ codes against power-law distributions.

  • Some quite surprising
    experiments
    showing that the
    transpose graph reacts very peculiarly to compression after
    lexicographical or Gray-code sorting.

  • A paper about
    HyperBall
    (then named HyperANF), our tool for computing an approximation of the
    neighbourhood function, reachable nodes and geometric centralities of
    massive graphs. More information can be found in this
    preprint.

  • HyperBall was used to
    find out that on average there are just four degrees of
    separation
    on
    Facebook, and the experiment was reported by the
    New York
    Times
    .
    Alas, the degrees were actually 3.74 (one less than the average
    distance
    ), but the off-by-one
    between graph theory (“distance”) and sociology (“degrees of separation”)
    generated a lot of confusion.

  • A paper were we
    describe our efforts to compress one of the largest social graphs
    available—the graph of commits of the Software Heritage archive.

  • A paper about our effort
    to bring WebGraph to Rust.

Contributors

Showing top 2 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from vigna/webgraph via the GitHub API.Last fetched: 6/27/2026