Repositories tagged with "webarchiving"
awesome-web-archiving
iipc
โAn Awesome List for getting started with web archivingโ
waybackpy
akamhy
โWayback Machine API interface & a command-line toolโ
warc-gpt
harvard-lil
โWARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.โ
Squidwarc
N0taN3rd
โSquidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a headโ
wget-lua
ArchiveTeam
โWget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.โ
awesome-memento
machawk1
โA list of things related to software, literature, and other content for ๐ฃ Mementoโ
node-warc
โParse And Create Web ARChive (WARC) files with node.jsโ
cc-notebooks
commoncrawl
โVarious Jupyter notebooks about Common Crawl dataโ
warcworker
peterk
โA dockerized, queued high fidelity web archiver based on Squidwarcโ