GitPedia

BrowserPilot

Open‑source alternative to Perplexity Comet, director.ai and firecrawl combined

From ai-naymul·Updated June 16, 2026·View on GitHub·

**Tell a browser what you want in plain English. It scrapes any website — even the ones that block everyone else.** The project is written primarily in Python, distributed under the MIT License license, first published in 2025. Key topics include: ai, ai-agent, ai-agents, browser, browser-agent.

Latest release: v1.1.0v1.1.0 — Ghost Mode + Production Bulk Scraping

BrowserPilot

Tell a browser what you want in plain English. It scrapes any website — even the ones that block everyone else.

License: MIT
Python 3.8+
Tests: 236 passed
PRs Welcome

<p align="center"> <img src="docs/stealth-benchmarks/browserpilot_demo.gif" alt="BrowserPilot — type a prompt, AI navigates and extracts data" width="700"> <br> <em>Type what you want → AI navigates any website → get structured data back.</em> </p>

Why BrowserPilot?

Most scraping tools break the moment a site has Cloudflare, DataDome, or Akamai. BrowserPilot doesn't.

BrowserPilotPlaywrightSeleniumBrowserbaseScrapy
Bypasses DataDome/AkamaiYesNoNoPartialNo
AI vision (works on any site)YesNoNoNoNo
Bulk scraping with stealthYesNoNoYes ($$$)Yes (no JS)
Self-hosted & freeYesYesYesNo ($30/mo)Yes
Human-like behaviorYesNoNoNoN/A
Pixelscan score105/105~60/105~40/105UnknownN/A

What You Can Do

python
# Single page — just describe what you want "Go to Amazon and extract all laptop prices under $1000 as JSON" # Bulk scrape — hit hundreds of pages across protected sites curl -X POST http://localhost:8000/bulk -H "Content-Type: application/json" -d '{ "urls": ["https://nike.com", "https://wayfair.com", "https://footlocker.com"], "prompt": "Extract product data", "format": "json", "max_workers": 3 }' # Watch it work — live browser stream in your browser # Open http://localhost:8000 and watch the AI navigate in real-time

Output formats: JSON, CSV, PDF, HTML, Markdown, plain text — just ask.


See It Work

Reddit's new React frontend — navigates feeds, clicks posts, scrolls comments. No selectors, no DOM parsing.

<p align="center"> <img src="docs/reddit_demo.gif" alt="BrowserPilot navigating Reddit's new frontend" width="700"> </p>

StatsMuse — scraped 191 rows of La Liga stats in seconds. JS-rendered data, no API needed.

<p align="center"> <img src="docs/statmuse_scrape.gif" alt="BrowserPilot scraping StatsMuse" width="700"> </p>

10 protected sites in 60 seconds — DataDome, Akamai, Cloudflare, PerimeterX. Zero blocks.

<p align="center"> <img src="docs/stealth-benchmarks/bypass_showcase.gif" alt="Bypassing 10 protected sites" width="700"> </p>

Stealth That Actually Works

We don't just claim stealth — we prove it. BrowserPilot passes every major bot detection benchmark:

BenchmarkScore
Pixelscan105/105 Clear
Sannysoft29/29 Passed
Rebrowser9/10 Pass
BrowserScanAll Normal
DeviceAndBrowserInfo"You are human!"
BrowserLeaks WebRTCNo IP Leak
<details> <summary><b>See benchmark screenshots</b></summary>
SannysoftPixelscanDeviceInfo
<img src="docs/stealth-benchmarks/sannysoft_section1.png" width="280"><img src="docs/stealth-benchmarks/pixelscan_section1.png" width="280"><img src="docs/stealth-benchmarks/deviceinfo_section1.png" width="280">
RebrowserBrowserScanBrowserLeaks
<img src="docs/stealth-benchmarks/rebrowser_section1.png" width="280"><img src="docs/stealth-benchmarks/browserscan_section1.png" width="280"><img src="docs/stealth-benchmarks/browserleaks-webrtc_section1.png" width="280">
</details>

Tested Against Real Anti-Bot Systems

These are the systems that block 99% of automation tools. BrowserPilot loaded 11 out of 14:

SiteAnti-BotResult
Foot LockerDataDome (Tier S)Loaded
LeboncoinDataDome (Tier S)Loaded
VintedDataDome (Tier S)Loaded
Booking.comDataDome + custom (Tier S)Loaded
NikeAkamai (Tier A)Loaded
New BalanceAkamai (Tier A)Loaded
ZalandoAkamai (Tier A)Loaded
WayfairPerimeterX (Tier A)Loaded
TicketmasterMultiple (Tier A)Loaded
Stake.comCloudflare EnterpriseLoaded
LinkedInCloudflare + customLoaded
<details> <summary><b>See anti-bot bypass screenshots</b></summary>
Foot Locker (DataDome)Leboncoin (DataDome)Vinted (DataDome)
<img src="docs/stealth-benchmarks/tiertest_footlocker.com.png" width="280"><img src="docs/stealth-benchmarks/tiertest_leboncoin.png" width="280"><img src="docs/stealth-benchmarks/tiertest_vinted.com.png" width="280">
Nike (Akamai)Wayfair (PerimeterX)Ticketmaster
<img src="docs/stealth-benchmarks/tiertest_nike.com.png" width="280"><img src="docs/stealth-benchmarks/tiertest_wayfair.com.png" width="280"><img src="docs/stealth-benchmarks/tiertest_ticketmaster.com.png" width="280">
New Balance (Akamai)Stake.com (CF Enterprise)Booking.com
<img src="docs/stealth-benchmarks/tiertest_newbalance.com.png" width="280"><img src="docs/stealth-benchmarks/tiertest_stake.png" width="280"><img src="docs/stealth-benchmarks/realworld_booking.png" width="280">
</details>

How the stealth works

  • Patchright — Playwright fork that never calls Runtime.enable (defeats CDP detection)
  • Full Chromium + xvfb — real browser window, real GPU, real WebGL fingerprints
  • Fingerprint rotation — each session gets a unique viewport, UA, DPR, locale, timezone
  • Human behavior — Bezier mouse curves, variable typing speed, natural scroll patterns
  • Geo-matching — proxy country auto-maps to correct timezone + locale
  • WebRTC blocked — local IP never leaks

No noise injection. Anti-bots detect canvas/WebGL noise by rendering known values. Real fingerprints from real hardware, varied through configuration, is stronger.


Bulk Scraping at Production Scale

Not a demo — a production bulk engine that scrapes hundreds of pages concurrently without getting blocked.

<p align="center"> <img src="docs/stealth-benchmarks/bulk_demo.gif" alt="Bulk scraping 10 protected sites" width="640"> </p>
FeatureHow
10 parallel workersEach with unique fingerprints
Context rotationNew identity every N pages, no browser restart
Resource blockingSkip images/fonts/CSS — 3-5x faster
Adaptive throttleBacks off on 429s, speeds up on success
Checkpoint/resumeCrash? Resume from where you stopped
Shared intelligenceOne worker blocked = all workers skip that combo
bash
# Start a bulk job curl -X POST http://localhost:8000/bulk \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://site1.com/page1", "https://site2.com/page2", "..."], "prompt": "Extract product names and prices", "format": "json", "max_workers": 5, "block_resources": true }' # Check progress curl http://localhost:8000/bulk/{job_id} # Resume after crash curl -X POST http://localhost:8000/bulk/{job_id}/resume
BenchmarkPagesSpeedBlocked
Hacker News15/1537.8 pages/min0
DataDome + Akamai + PerimeterX + Cloudflare10/1033.7 pages/min0

Quick Start

bash
git clone https://github.com/ai-naymul/BrowserPilot.git cd BrowserPilot echo 'GOOGLE_API_KEY=your_key_here' > .env docker-compose up -d

Open http://localhost:8000 — done.

Manual

bash
git clone https://github.com/ai-naymul/BrowserPilot.git && cd BrowserPilot pip install -r requirements.txt echo 'GOOGLE_API_KEY=your_key_here' > .env python -m uvicorn backend.main:app --reload

Configuration

bash
# Required GOOGLE_API_KEY=your_gemini_api_key # Optional — proxies for heavy scraping SCRAPER_PROXIES=[{"server": "http://proxy:port", "username": "user", "password": "pass", "location": "US"}]

Use Cases

Price monitoring — Track competitor pricing across Amazon, Walmart, Best Buy. Get structured JSON, schedule with cron.

Lead generation — Extract company data from LinkedIn, G2, Crunchbase. BrowserPilot handles login walls and infinite scroll.

Real estate data — Pull listings from Zillow, Realtor.com, Redfin. Export as CSV for analysis.

Market research — Monitor product launches on Product Hunt, reviews on Trustpilot, job postings on Indeed.

Academic research — Collect data from government portals, research databases, news sites that block standard scrapers.


How It Works

You type: "Extract laptop prices from Best Buy"
    |
    v
AI Vision (Gemini 2.5 Flash) sees the page like you do
    |
    v
Decides: click search, type query, scroll, extract data
    |
    v
Ghost Mode stealth keeps it undetected
    |
    v
Structured output: JSON / CSV / PDF / whatever you asked for

The AI doesn't rely on CSS selectors or DOM structure — it looks at a screenshot and decides what to do. When a site redesigns, BrowserPilot doesn't break.


Roadmap

VersionWhatStatus
v1.0Foundation — tests, CI, Docker, communityDone
v1.1Ghost Mode — stealth, bulk scraping, human behaviorDone
v1.2Universal Proxy — SOCKS4/5, file input, geo-routingNext
v1.3Crawl Anything — pagination, sitemaps, full-site crawlPlanned
v2.0Generative UI — natural language to live visual dashboardsPlanned

Contributing

PRs welcome. Read the contributing guide or just:

  1. Fork it
  2. Create a branch (git checkout -b my-feature)
  3. Make changes + add tests
  4. Open a PR

Acknowledgments

Patchright | Playwright | Google Gemini | FastAPI


<p align="center"> <b>If BrowserPilot saves you time, drop a star. It helps more people find it.</b> </p>

Contributors

Showing top 4 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from ai-naymul/BrowserPilot via the GitHub API.Last fetched: 6/21/2026