Download Tesseract OCR

Tesseract
Tesseract OCR is an optical character recognition engine. The program converts images containing text into editable text data. It supports more than 100 languages including complex scripts. The technology works with scanned images, document photographs, and screenshots. The system handles common image formats such as TIFF, JPEG, PNG, and BMP. It provides output in plain text, PDF, and HTML formats. The modular architecture allows training with new languages and fonts. Processing includes layout analysis, line and word detection.
Version: 5.4.0
Size: 47.9 MB
Systems: Windows, Linux