GitPedia

Crawler

https://spatie.be/docs/crawler

From spatie·Updated June 29, 2026·View on GitHub·

This package provides a powerful, easy to use class to crawl links on a website. Under the hood, Guzzle promises are used to [crawl multiple URLs concurrently](http://docs.guzzlephp.org/en/latest/quickstart.html?highlight=pool#concurrent-requests). The project is written primarily in PHP, distributed under the MIT License license, first published in 2015. It has gained significant community traction with 2,829 stars and 367 forks on GitHub. Key topics include: concurrency, crawler, guzzle, php.

Latest release: 9.3.2
June 12, 2026View Changelog →
<div align="left"> <a href="https://spatie.be/open-source?utm_source=github&utm_medium=banner&utm_campaign=crawler"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://spatie.be/packages/header/crawler/html/dark.webp?"> <img alt="Logo for crawler" src="https://spatie.be/packages/header/crawler/html/light.webp"> </picture> </a> <h1>Crawl the web using PHP</h1>

Latest Version on Packagist
MIT Licensed
Tests
Total Downloads

</div>

This package provides a powerful, easy to use class to crawl links on a website. Under the hood, Guzzle promises are used to crawl multiple URLs concurrently.

Because the crawler can execute JavaScript, it can crawl JavaScript rendered sites. Under the hood, Chrome and Puppeteer are used to power this feature.

Here's a quick example:

php
use Spatie\Crawler\Crawler; use Spatie\Crawler\CrawlResponse; Crawler::create('https://example.com') ->onCrawled(function (string $url, CrawlResponse $response) { echo "{$url}: {$response->status()}\n"; }) ->start();

Or collect all URLs on a site:

php
$urls = Crawler::create('https://example.com') ->internalOnly() ->depth(3) ->foundUrls();

You can also test your crawl logic without making real HTTP requests:

php
Crawler::create('https://example.com') ->fake([ 'https://example.com' => '<html><a href="/about">About</a></html>', 'https://example.com/about' => '<html>About page</html>', ]) ->foundUrls();

If you need to stop a crawl based on external state, you can register a callback that receives the current crawler instance and is checked before scheduling each next request:

php
use Spatie\Crawler\Crawler; $shouldStop = false; Crawler::create('https://example.com') ->shouldStopCallback(function (Crawler $crawler) use (&$shouldStop) { return $shouldStop; }) ->onCrawled(function (string $url) use (&$shouldStop) { $shouldStop = true; }) ->start();

Support us

<img src="https://github-ads.s3.eu-central-1.amazonaws.com/crawler.jpg?t=1" width="419px" />

We invest a lot of resources into creating best in class open source packages. You can support us by buying one of our paid products.

We highly appreciate you sending us a postcard from your hometown, mentioning which of our package(s) you are using. You'll find our address on our contact page. We publish all received postcards on our virtual postcard wall.

Documentation

All documentation is available on our documentation site.

Testing

bash
composer test

Changelog

Please see CHANGELOG for more information on what has changed recently.

Contributing

Please see CONTRIBUTING for details.

Security Vulnerabilities

Please review our security policy on how to report security vulnerabilities.

Credits

License

The MIT License (MIT). Please see License File for more information.

Contributors

Showing top 12 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from spatie/crawler via the GitHub API.Last fetched: 6/29/2026