GitPedia

Cc webgraph statistics

Statistics of Common Crawl monthly Web Graphs

From commoncrawl·Updated June 24, 2026·View on GitHub·

Web page showing statistics and plots derived from Common Crawl's monthly Web Graphs, and generation tools. The project is written primarily in JavaScript, distributed under the Apache License 2.0 license, first published in 2024.

cc-webgraph-statistics

image

Web page showing statistics and plots derived from Common Crawl's monthly Web Graphs, and generation tools.

Setup

cd src
make

(For more details when running on Mac, see below.)

Updating

cd src
make update

Notes

Built web page can be found in docs/index.html.

Dependencies

You may need to install JSON::XS via cpanm. You may also wish to use a Python environment in order for the Makefile's pip install to do its thing.

[!TIP]
Recommended: Create a virtual environment before running make, especially on macOS where the system Python may not allow pip install:

bash
cd src python3 -m venv .venv source .venv/bin/activate make

[!TIP]
If you encounter the message:

Can't verify SSL peers without knowing which Certificate Authorities to trust

This is likely to be fixed by running:

cpanm LWP::Protocol::https IO::Socket::SSL Mozilla::CA

[!TIP]
macOS users: You may need to install Perl dependencies via Homebrew:

bash
brew install ca-certificates cpanminus cpanm LWP::Protocol::https IO::Socket::SSL Mozilla::CA export PERL_LWP_SSL_CA_FILE="$(brew --prefix)/etc/ca-certificates/cert.pem"

Local development

If you are running this locally, you may see "No data available" on the rank tables unless you serve the site with a local HTTP server.

[!TIP]
fetch() will fail silently when viewing the page as a file:// URL because browsers block local file access for security reasons. To fix this:

bash
cd docs && python3 -m http.server 8000

Then open http://localhost:8000.

Contact

Please feel free to contact us if you have any questions or need any assistance.

Contributors

Showing top 6 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from commoncrawl/cc-webgraph-statistics via the GitHub API.Last fetched: 6/26/2026