Cc webgraph statistics
Statistics of Common Crawl monthly Web Graphs
Web page showing statistics and plots derived from Common Crawl's monthly Web Graphs, and generation tools. The project is written primarily in JavaScript, distributed under the Apache License 2.0 license, first published in 2024.
cc-webgraph-statistics

Web page showing statistics and plots derived from Common Crawl's monthly Web Graphs, and generation tools.
Setup
cd src
make
(For more details when running on Mac, see below.)
Updating
cd src
make update
Notes
Built web page can be found in docs/index.html.
Dependencies
You may need to install JSON::XS via cpanm. You may also wish to use a Python environment in order for the Makefile's pip install to do its thing.
[!TIP]
Recommended: Create a virtual environment before runningmake, especially on macOS where the system Python may not allowpip install:bashcd src python3 -m venv .venv source .venv/bin/activate make
[!TIP]
If you encounter the message:Can't verify SSL peers without knowing which Certificate Authorities to trustThis is likely to be fixed by running:
cpanm LWP::Protocol::https IO::Socket::SSL Mozilla::CA
[!TIP]
macOS users: You may need to install Perl dependencies via Homebrew:bashbrew install ca-certificates cpanminus cpanm LWP::Protocol::https IO::Socket::SSL Mozilla::CA export PERL_LWP_SSL_CA_FILE="$(brew --prefix)/etc/ca-certificates/cert.pem"
Local development
If you are running this locally, you may see "No data available" on the rank tables unless you serve the site with a local HTTP server.
[!TIP]
fetch()will fail silently when viewing the page as afile://URL because browsers block local file access for security reasons. To fix this:bashcd docs && python3 -m http.server 8000Then open http://localhost:8000.
Contact
Please feel free to contact us if you have any questions or need any assistance.
Contributors
Showing top 6 contributors by commit count.
