HomeExplorepdf-to-markdown

Pdf To Markdown Collection

Repositories tagged with "pdf-to-markdown"

RARE

TCG-style cards with ATK/DEF/SPD stats

UNCOMMON

⭐4.3kHP

◆

💎Aqua

★★

llama_cloud_services

run-llama

TypeScriptdocumentdocument-parser

“Knowledge Agents and Management in the Cloud”

★

4.3k

471

4.3k

471 forks

ATK

DEF

SPD

GitPedia #284

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

4.3k

471

4.3k

UNCOMMON

⭐1.5kHP

◆

🔮Psychic

★★

docstrange

NanoNets

Pythonaidocument-parser

“Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.”

★

1.5k

135

1.5k

135 forks

ATK

DEF

SPD

GitPedia #798

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

1.5k

135

1.5k

UNCOMMON

⭐1.3kHP

◆

📦Normal

★★

e2m

wisupai

Jupyter Notebookdoc2xe2m

“E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M offers an all-in-one, flexible, and open-source solution.”

★

1.3k

73 forks

ATK

DEF

SPD

GitPedia #815

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

1.3k

UNCOMMON

⭐895HP

◆

🔮Psychic

★★

api-llm-ocr

yigitkonur

Pythondocument-aifastapi

“PDF to markdown using vision LLMs — tables, layouts, and structure preserved”

★

895

60 forks

ATK

DEF

SPD

GitPedia #256

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

895

UNCOMMON

⭐844HP

◆

⚙️Steel

★★

pdf_oxide

yfedoseev

Rustdata-extractiondocument-processing

“The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.”

★

844

92 forks

ATK

DEF

SPD

GitPedia #230

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

844

UNCOMMON

⭐765HP

◆

🔮Psychic

★★

docling-api

drmingler

Pythonapifastapi

“Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is Ideal for large-scale workflows, it offers text/table extraction, OCR, and batch processing with sync/async endpoints.”

★

765

80 forks

ATK

DEF

SPD

GitPedia #093

2/5

View wiki →𝕏

GitPedia

Repository Card

UNCOMMON

★

765

COMMON

⭐478HP

◆

🔮Psychic

★

vision-parse

iamarunbrahma

Pythondocument-parserpdf-parser

“Parse PDFs into markdown using Vision LLMs”

★

478

68 forks

ATK

DEF

SPD

GitPedia #809

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

478

COMMON

⭐358HP

◆

🔮Psychic

★

LaTeXSnipper

SakuraMathcraft

Pythonautoencoderdeep-learning

“A math workspace for screenshot OCR, handwriting-to-LaTeX, editing, preview, and symbolic computation, powered by MathCraft OCR and MathLive.”

★

358

21 forks

ATK

DEF

SPD

GitPedia #904

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

358

COMMON

⭐253HP

◆

🔮Psychic

★

pdfmd

M1ck4

Pythoncli-toolgui-application

“Smart PDF to Markdown converter with intelligent heading detection, automatic header/footer removal, orphan fragment merging, and image export. Features a user-friendly GUI with preview mode, persistent settings, and per-page error recovery. Optimized for Obsidian and other Markdown-based note-taking workflows.”

★

253

51 forks

ATK

DEF

SPD

GitPedia #340

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

253

COMMON

⭐209HP

◆

🔮Psychic

★

markdrop

shoryasethia

Pythonagentsdocling

“A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functionalities. Markdrop is available on PyPI.”

★

209

18 forks

ATK

DEF

SPD

GitPedia #993

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

209

COMMON

⭐192HP

◆

⚡Electric

★

pullmd

AeternaLabsHQ

JavaScriptclaudeclaude-code

“Self-hosted URL- and file-to-Markdown service for humans and AI agents - web pages, documents, images, audio, YouTube. PWA + REST + MCP + Claude Code skill, Reddit-aware, refreshable share links.”

★

192

15 forks

ATK

DEF

SPD

GitPedia #218

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

192

COMMON

⭐179HP

◆

🔮Psychic

★

pdf-to-markdown

iamarunbrahma

Pythondocument-conversiondocument-processing

“Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.”

★

179

24 forks

ATK

DEF

SPD

GitPedia #607

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

179

COMMON

⭐123HP

◆

🔮Psychic

★

chunky

GiovanniPasq

Pythonchonkiechunk-validation

“Open-source toolkit for reliable RAG pipelines: convert PDFs to Markdown, clean documents, inspect chunks, compare chunking strategies, and enrich metadata for LLM applications.”

★

123

8 forks

ATK

DEF

SPD

GitPedia #578

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

123

COMMON

⭐77HP

◆

🔮Psychic

★

docwen

ZHYX91

Pythondocument-converterdocx

“A local document converter for Word/Markdown/Excel bidirectional conversion. Supports PDF, OCR, and 11 languages.”

★

6 forks

ATK

DEF

SPD

GitPedia #302

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

COMMON

⭐75HP

◆

🔮Psychic

★

smart-llm-loader

drmingler

Pythonchatbotchunking

“smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.”

★

3 forks

ATK

DEF

SPD

GitPedia #471

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

COMMON

⭐70HP

◆

🔮Psychic

★

pdfmux

NameetP

Pythonai-agentdocling

“PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.”

★

11 forks

ATK

DEF

SPD

GitPedia #547

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★

COMMON

⭐62HP

◆

🔮Psychic

★

MinerU-Skill

Nebutra

Pythonai-agentsai-skill

“AI-Native document parser: PDF, Office & images → clean Markdown with LaTeX, tables & OCR. Zero-dependency CLI & skill for Claude Code, Cursor & AI agents.”

★

2 forks

ATK

DEF

SPD

GitPedia #345

1/5

View wiki →𝕏

GitPedia

Repository Card

COMMON

★