I've been scanning in the D&D 5th Edition player's handbook. I tried
every open source OCR program I could find and tesseract was easily the
best. On pages that are just prose, it probably does about 99% accuracy.
Even on pages where that are 2 columns of prose, it does really well if
you tell it to look for that. Somebody sent me a pdf of the same book
done with a professional OCR program for Windows. The results are
approximately equal. Tesseract may lack the bells & whistles of
commercial products but for accuracy, it's pretty good.
On 11/01/2015 11:24 PM, Tom Fowle wrote:
Am I the last to find this?
command line ocr tesseract
won't directly support .pdf but
pdftocairo
produces .jpg among others which tesseract will read.
May not do well with collumns but not too bad.
Is there anything better?
Thanks
tom Fowle
_______________________________________________
Speakup mailing list
Speakup@xxxxxxxxxxxxxxxxx
http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup
--
John Heim, jheim@xxxxxxxxxxxxx, 608-263-4189, skype:john.g.heim,
sip:jheim@xxxxxxxxxxxxxxxx
_______________________________________________
Speakup mailing list
Speakup@xxxxxxxxxxxxxxxxx
http://linux-speakup.org/cgi-bin/mailman/listinfo/speakup