Extract text from a scanned PDF using on-device OCR.
The PDF OCR tool runs optical character recognition on a scanned PDF so the text it contains becomes selectable, searchable, and copyable. Use it for digitising printed receipts, archived contracts, books, and historical documents.
Pick one or more languages — the recogniser benefits from explicit hints about the script used in the document. For Indian invoices in English with Devanagari headers, select both English and Hindi.
Recognition runs in your browser using a WebAssembly build of the open-source Tesseract engine. Language data is downloaded once and cached, after which OCR runs completely offline. Your PDF and the extracted text never leave your device.