On 12/15/2015 11:35 PM, Fred Smith
wrote:
About 3 or so years ago, I tried tesseract and it was only about 80%On Wed, Dec 16, 2015 at 01:36:40PM +1030, Tim wrote:Allegedly, on or about 15 December 2015, jd1008 sent:Downloaded and tried tesseract and cuneiform, and both fail to work on any of the pdf images I have. These images are NOT encrypted as they are public documents like from the DMV, ... etc.But are they good quality images? OCR needs a reasonable resolution, *and* clean character definition.When I was using tesseract a few years ago (as mentioned earlier in this thread) I was getting PDFs made of scanned legal documents (from Groklaw, documents from the SCO v IBM case). These were pretty awful quality, as if they had been scanned at some terribly low resolution from what may have been poor quality originals (or copies thereof). They were very messy to look at, but tesseract could read most of it fairly well. but converting to higher-resolution TIFF files actually made the OCR work more poorly, odd as that may seem. on good quality print. I tried the Windows program ABBYY and it was virtually perfect. So if you have a long document, or a bunch of documents, spend the money and find a Windows machine if you don't have one. You'd be surprised how much work it is to go thru and change "1"s to "l"s or vice versa, Or "i"s. And that's just one example. And I can almost guarantee you'll miss a couple! (If the document is not too long, it might be worthwhile to have someone read it to you and type it in by hand!) Looking at Google output, I find a free on-line service, but there might be a problem sending a scanned file due to file size--check with your usp. The url is: www.onlineocr.net As they say on the net, YMMV! --doug |
-- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org