Re: A reliable ocr program for Fedora

Fred Smith <fredex@xxxxxxxxxxxxxxxxxxxxxx> · Tue, 15 Dec 2015 23:35:20 -0500

On Wed, Dec 16, 2015 at 01:36:40PM +1030, Tim wrote:
> Allegedly, on or about 15 December 2015, jd1008 sent:
> > Downloaded and tried tesseract and cuneiform, and both fail to
> > work on any of the pdf images I have. These images are NOT encrypted
> > as they are public documents like from the DMV, ... etc. 
> 
> But are they good quality images?  OCR needs a reasonable resolution,
> *and* clean character definition.

When I was using tesseract a few years ago (as mentioned earlier
in this thread) I was getting PDFs made of scanned legal documents
(from Groklaw, documents from the SCO v IBM case). These were pretty
awful quality, as if they had been scanned at some terribly low
resolution from what may have been poor quality originals (or copies
thereof). They were very messy to look at, but tesseract could read
most of it fairly well. but converting to higher-resolution TIFF 
files actually made the OCR work more poorly, odd as that may seem.

-- 
---- Fred Smith -- fredex@xxxxxxxxxxxxxxxxxxxxxx -----------------------------
                       I can do all things through Christ 
                              who strengthens me.
------------------------------ Philippians 4:13 -------------------------------
-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org