new OCR project for Linux!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Marcel Oats  <speakup at braille.uwo.ca> wrote:
>I'd like a PDF converter for Linux.  Any ideas?

I would like to discuss the aspect of this issue which is on topic for this
thread.

Some PDF documents contain only scanned images of the printed pages; there is
no character-encoded text in such files.

Given an OCR system, it should be possible to convert such files to text by
extracting the page images using pdfimage (part of Xpdf), performing any
conversions that may be necessary, then processing the image files with OCR.

The quality of the output depends, of course, on the accuracy of the OCR
system and the characteristics of the page images.





[Index of Archives]     [Linux for the Blind]     [Fedora Discussioin]     [Linux Kernel]     [Yosemite News]     [Big List of Linux Books]
  Powered by Linux