Papercast
A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.
An extensible pipeline tool and plugin ecosystem for processing technical documents. Written in Python. The project is written primarily in Python, distributed under the MIT License license, first published in 2023. Key topics include: arxiv, dag, document-parser, document-parsing, grobid.
Latest release: v0.1.0
April 8, 2023View Changelog →
Papercast

An extensible pipeline tool and plugin ecosystem for processing technical documents. Written in Python.
Features
| Feature | Examples |
|---|---|
| Add documents in multiple formats, from popular sources: | PDF <br /> LaTeX <br /> ArXiv <br /> SemanticScholar |
| Flexible Text Extraction | GROBID <br /> More coming soon! <br /> Write your own! |
| Flexible Text Narration | OSX say command <br /> More coming soon! <br /> Write your own! |
| Publish to multiple endpoints: | Self-hosted RSS podcast using GitHub Pages <br /> Any other endpoint you can think of |
| Run anywhere: | Local machine <br /> Remote server <br /> Cloud (AWS, GCP, Azure, etc.) |
More Info
Contributors
Showing top 2 contributors by commit count.
This article is auto-generated from papercast-dev/papercast via the GitHub API.Last fetched: 6/21/2026
