Never miss a release that matters
AI-powered summaries of every GitHub release.
AI Summaries
Changelogs condensed into clear, actionable insights.
Always Free
Track up to 5 packages at no cost, forever.
Weekly Digest
A curated summary of every release, delivered weekly.
TL;DR
Tesseract now builds with modern CMake, improving compatibility and potentially speeding up compilation (build system).
Fixes Worth Knowing
ALTO output (XML format for text recognition) now correctly includes version information.
TL;DR
Tesseract now includes a new PAGE XML renderer (for structured document output) and improved PDF rendering, alongside the ability to retrieve text angle/gradient information.
Breaking
- OpenCL Support Removed: OpenCL support and related API functions have been removed. (Graphics processing framework)
New
- PAGE XML Renderer: Export OCR results in PAGE XML format for structured document analysis.
- Improved PDF Rendering: Fixes issues with grey results in indexed PNG images within PDF documents.
- Text Angle/Gradient Access: The API now allows retrieval of the detected text angle and gradient.
Fixes Worth Knowing
- Fixed a crash in
Wordrec::angle_changeimproving stability. - Corrected installation issues with Conda environments (package/dependency manager).
- Resolved typos and minor performance issues reported by code scanning tools.
TL;DR
Tesseract 4.1.3 resolves a build issue present in 4.1.2, ensuring consistent functionality across all build methods (software for optical character recognition).
Fixes Worth Knowing
Training now supports line images with larger widths, improving compatibility with varied input data. The autoconf build process is now stable, resolving issues experienced in the previous release.
Before You Upgrade
If you previously attempted to build Tesseract from source using autoconf, upgrade to 4.1.3 to avoid build failures.
TL;DR
Tesseract now supports reading images from URLs (web addresses) using libcurl, expanding where you can use OCR.
New
- URL Image Support: Tesseract can now process images directly from web addresses.
Fixes Worth Knowing
- TIFF output files are now named more clearly with page numbers included.
TL;DR
This release improves the accuracy of Optical Character Recognition (OCR) by addressing reported bugs.
Fixes Worth Knowing
Several bugs impacting OCR accuracy have been resolved in this release, leading to more reliable text extraction.