Re: reading pictures of text in pdf

Paul Merrell <marbux@xxxxxxxxx> · Thu, 12 Nov 2015 17:02:14 -0800

On Thu, Nov 12, 2015 at 4:10 PM, Brian Tew <montanalag@xxxxxxxxx> wrote:
> Is there anything in linux that can convert a pdf file that is a picture of text
> into real actual plain text?

Assuming there's no DRM involved, tesseract-OCR is probably your best
bet. <https://code.google.com/p/tesseract-ocr/>. The source code has
moved to <https://github.com/tesseract-ocr> but the documentation
seems to still be on code.google.com.

Best regards,

Paul

-- 
[Notice not included in the above original message:  The U.S. National
Security Agency neither confirms nor denies that it intercepted this
message.]

_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list