Re: A reliable ocr program for Fedora

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 12/15/2015 11:35 PM, Fred Smith wrote:
On Wed, Dec 16, 2015 at 01:36:40PM +1030, Tim wrote:
Allegedly, on or about 15 December 2015, jd1008 sent:
Downloaded and tried tesseract and cuneiform, and both fail to
work on any of the pdf images I have. These images are NOT encrypted
as they are public documents like from the DMV, ... etc. 
But are they good quality images?  OCR needs a reasonable resolution,
*and* clean character definition.
When I was using tesseract a few years ago (as mentioned earlier
in this thread) I was getting PDFs made of scanned legal documents
(from Groklaw, documents from the SCO v IBM case). These were pretty
awful quality, as if they had been scanned at some terribly low
resolution from what may have been poor quality originals (or copies
thereof). They were very messy to look at, but tesseract could read
most of it fairly well. but converting to higher-resolution TIFF 
files actually made the OCR work more poorly, odd as that may seem.

About 3 or so years ago, I tried tesseract and it was only about 80%
on good quality print. I tried the Windows program ABBYY and it
was virtually perfect. So if you have a long document, or a bunch
of documents, spend the money and find a Windows machine if
you don't have one.  You'd be surprised how much work it is to
go thru and change "1"s to "l"s or vice versa, Or "i"s. And that's
just one example. And I can almost guarantee you'll miss a couple!
(If the document is not too long, it might be worthwhile to have
someone read it to you and type it in by hand!)

Looking at Google output, I find a free on-line service, but there might
be a problem sending a scanned file due to file size--check with your
usp. The url is: www.onlineocr.net

As they say on the net, YMMV!

--doug


-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org
[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux