GitPedia

Datacrawl

A simple and easy to use web crawler for Python

From DataCrawl-AI·Updated June 15, 2026·View on GitHub·

- Crawl web pages and extract links starting from a root URL recursively - Concurrent workers and custom delay - Handle relative and absolute URLs - Designed with simplicity in mind, making it easy to use and extend for various web crawling tasks The project is written primarily in Python, distributed under the MIT License license, first published in 2018. Key topics include: crawler, crawling, python, python-package, python-web-crawler.

Latest release: v0.5.0
July 11, 2024View Changelog →

cover

Tiny Web Crawler

CI
Coverage badge
Stable Version
License: MIT
Download Stats
Discord

A simple and efficient web crawler for Python.

Features

  • Crawl web pages and extract links starting from a root URL recursively
  • Concurrent workers and custom delay
  • Handle relative and absolute URLs
  • Designed with simplicity in mind, making it easy to use and extend for various web crawling tasks

Installation

Install using pip:

sh
pip install tiny-web-crawler

Usage

python
from tiny_web_crawler import Spider from tiny_web_crawler import SpiderSettings settings = SpiderSettings( root_url = 'http://github.com', max_links = 2 ) spider = Spider(settings) spider.start() # Set workers and delay (default: delay is 0.5 sec and verbose is True) # If you do not want delay, set delay=0 settings = SpiderSettings( root_url = 'https://github.com', max_links = 5, max_workers = 5, delay = 1, verbose = False ) spider = Spider(settings) spider.start()

Output Format

Crawled output sample for https://github.com

json
{ "http://github.com": { "urls": [ "http://github.com/", "https://githubuniverse.com/", "..." ], "https://github.com/solutions/ci-cd": { "urls": [ "https://github.com/solutions/ci-cd/", "https://githubuniverse.com/", "..." ] } } }

Contributing

Thank you for considering to contribute.

  • If you are a first time contributor you can pick a good-first-issue and get started.
  • Please feel free to ask questions.
  • Before starting to work on an issue. Please get it assigned to you so that we can avoid multiple people from working on the same issue.
  • We are working on doing our first major release. Please check this issue and see if anything interests you.

Dev setup

  • Install poetry in your system pipx install poetry
  • Clone the repo you forked
  • Create a venv or use poetry shell
  • Run poetry install --with dev
  • pre-commit install (see)
  • pre-commit install --hook-type pre-push

Before raising a PR. Please make sure you have these checks covered

  • An issue exists or is created which address the PR
  • Tests are written for the changes
  • All lint/test passes

Contributors

Showing top 2 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from DataCrawl-AI/datacrawl via the GitHub API.Last fetched: 6/28/2026