Re: what software used for ocr on linux

Tony Baechler <tony@xxxxxxxxxxxx> · Fri, 11 Jul 2014 00:01:10 -0700

I vaguely recall Tesseract having an option for this, but it isn't
automatic.  Convert from ImageMagick should do that as well, but it isn't
automatic either.  The short answer is trial and error if memory serves.  I
remember thinking that maybe the reason for the terrible OCR is due to the
pages not being aligned and rotating the images, but I didn't get any better
results.  I haven't played with the other OCR engines.  I think FineReader
is better about this.  I'm possibly wrong here, but as I understand it, the
Windows software for the blind does the image rotation before passing it to
the OCR engine and detects the page misallignment during the scanning
process.  The Internet Archive seems to use FineReader and scans millions of
books in all kinds of conditions, so perhaps it can handle the rotation
automatically.

On 2014-07-10 06:49 AM, Sam Hartman wrote:
> Is there a way to get tesseract or openocr or anything open-source to
> deal with rotations?
> The commercial software along with anything targeted for the blind tends
> to 
> 
> 1) deal with 90 or 180-degree rotaions--I put the book down on the glass
> in the wrong orientation
> 
> and
> 
> 2) Deal with small rotations (it wasn't perfectly aligned) relatively
> well.
> 
> I find these features really important when scanning things myself.
> Less so when OCRing images from the web etc.
> 
> _______________________________________________
> Blinux-list mailing list
> Blinux-list@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/blinux-list
> 

-- 
Have a good day,
Tony Baechler
tony@xxxxxxxxxxxx

_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list