GitPedia

Papercast

A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines.

From papercast-dev·Updated May 30, 2026·View on GitHub·

An extensible pipeline tool and plugin ecosystem for processing technical documents. Written in Python. The project is written primarily in Python, distributed under the MIT License license, first published in 2023. Key topics include: arxiv, dag, document-parser, document-parsing, grobid.

Latest release: v0.1.0
April 8, 2023View Changelog →

Papercast

Documentation Status
License
Papercast Discord

papercast logo

An extensible pipeline tool and plugin ecosystem for processing technical documents. Written in Python.

Features

FeatureExamples
Add documents in multiple formats, from popular sources:PDF <br /> LaTeX <br /> ArXiv <br /> SemanticScholar
Flexible Text ExtractionGROBID <br /> More coming soon! <br /> Write your own!
Flexible Text NarrationOSX say command <br /> More coming soon! <br /> Write your own!
Publish to multiple endpoints:Self-hosted RSS podcast using GitHub Pages <br /> Any other endpoint you can think of
Run anywhere:Local machine <br /> Remote server <br /> Cloud (AWS, GCP, Azure, etc.)

More Info

Contributors

Showing top 2 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from papercast-dev/papercast via the GitHub API.Last fetched: 6/21/2026