OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
-
Updated
Dec 2, 2022 - Jupyter Notebook
OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.
Convert scanned PDFs into searchable text locally using Vision LLMs (olmOCR). 100% private, offline, and free. Features a modern Web UI & CLI.
A powerful and user-friendly tool based on OCRmyPDF, offering a seamless GUI for conversion of image-based PDFs into searchable text.
Perform Optical Character Recognition (OCR) on a scanned PDF file containing Arabic text and output a searchable PDF
A Python script that runs Paddle OCR on a possibly unsearchable PDF to make it searchable.
This batch script creates a searchable PDF of a PDF with one or more scanned pages which contain images.
Extract tables from searchable as well as non-searchable pdf files
Create a searchable PDF with ALTO-XML and JP2 files.
NeuroScan-AI is an advanced document-understanding engine built with modern computer vision and OCR pipelines. It performs smart perspective correction, illumination normalization, and adaptive enhancement to transform raw camera captures into clean, searchable, professional-grade documents.
Convert scanned PDF documents into searchable, OCR-processed, and PDF compliant files using ocrmypdf, powered by an interactive Streamlit interface. Supports parallel processing to handle large documents efficiently.
Quick proof of concept to perform OCR on images.
PySide6 app to perform batch image/PDF processing and OCR.
Tool for creating searchable PDFs
Lightweight bash script to convert scanned PDFs into searchable, copyable PDFs using Tesseract OCR with parallel processing.
A wrapper on top of python-OCR tools such as pytesseract and easyocr, to recognize and extract text embedded in images. Also, convert scanned-PDFs to text searchable PDFs.
Add a description, image, and links to the searchable-pdf topic page so that developers can more easily learn about it.
To associate your repository with the searchable-pdf topic, visit your repo's landing page and select "manage topics."