Anyone able to OCR a PDF file?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Willem van der Walt wrote:
> The different ocr engines require different image formats.
> Some of them are really dum.

They probably derive from old code written without a
format-independent graphics library.

> I find that the best of the open-source engines is cuneiform.

Aha, interesting.  I've always used tesseract.  cuneiform is
in debian wheezy (testing) but not yet in debian stable... 

Depending on how the PDF was produced, it's possible that
  ps2txt filename.pdf
(a.k.a. ps2ascii) might help; I think it comes with ghostscript.

Regards,  Peter Billam

http://www.pjb.com.au       pj at pjb.com.au      (03) 6278 9410
"Was der Meister nicht kann,   verm?cht es der Knabe, h?tt er
 ihm immer gehorcht?"   Siegfried to Mime, from Act 1 Scene 2




[Index of Archives]     [Linux for the Blind]     [Fedora Discussioin]     [Linux Kernel]     [Yosemite News]     [Big List of Linux Books]
  Powered by Linux