Pdf text data extractor
PDF text data extraction web app with OCR for scanned documents
PDF text data extraction app that takes a PDF document as input and returns either a txt file that contains all pages or a compressed folder of txt files representing the document pages. OCR can also be enabled for scanned docoments. The project is written primarily in Python, first published in 2022. Key topics include: ocr, ocr-python, ocr-text-reader, pdf, pdf-to-text.
PDF to Text
PDF text data extraction app that takes a PDF document as input and returns either a txt file that contains all pages or a compressed folder of txt files representing the document pages. OCR can also be enabled for scanned docoments.
How does it worK?
mermaidflowchart LR A[PDF] --> |text conversion / OCR| B(Text) B --> |Option 1| D[txt file] B --> |Option 2| E[ZIP folder of txt files for pages]
- Upload your PDF.
- Enable OCR (for scanned documents).
- Select the PDF language.
- Download your output file (zip/txt).
How to support the project
You can help support the project through feedback and/or buy me coffee.
Contributors
Showing top 1 contributor by commit count.
