Anyone able to OCR a PDF file?

wvdwalt@xxxxxxxxxx (Willem van der Walt) · Wed, 4 Jan 2012 10:46:24 +0200 (SAST)

Hi,
Janina, I have done that script, sort of.
In the kies package I have released a while ago, there is a set of scripts 
to scan and do ocr on image files.
It is no rocket science, but it works well enough.
The different ocr engines require different image formats.  Some of them 
are really dum.
My set of scripts try to handle all that in the background, screening it 
from the user.
One can use different OCR engines like cuneiform, tesseract and now even 
the commercial ABBYY Finereader engine which is available for 149 euro.
I reacently have to do a lot of OCR, and now have a license at work for 
that engine.
I find that the best of the open-source engines is cuneiform.
The main script for scan/OCR stuff is called kies_p2t, for paper to text.
The kies tarball can be found at:
ftp://ftp.csir.co.za/NI/National_Accessibility_Portal/wvdwalt/kies-latest.tar.bz2
Regards, Willem

On Tue, 3 Jan 2012, Janina Sajka wrote:

> Thanks, Samuel! I think that will work nicely in a little script I can
> quickly concoct.
>
> Janina
>
> Samuel Thibault writes:
>> Janina Sajka, le Tue 03 Jan 2012 11:40:45 -0500, a ?crit :
>>> Has anyone figured out how to get one of the Linux OCR engines (like
>>> tesseract) to accept a graphical file (other than .tiff) as input?
>>
>> You can use imagemagick's convert tool to convert from .pdf to .tiff:
>>
>> convert test.pdf test.tiff
>>
>> Samuel
>> _______________________________________________
>> Speakup mailing list
>> Speakup at braille.uwo.ca
>> http://speech.braille.uwo.ca/mailman/listinfo/speakup
>
> -- 
>
> Janina Sajka,	Phone:	+1.443.300.2200
> 		sip:janina at asterisk.rednote.net
>
> Chair, Open Accessibility	janina at a11y.org
> Linux Foundation		http://a11y.org
>
> Chair, Protocols & Formats
> Web Accessibility Initiative	http://www.w3.org/wai/pf
> World Wide Web Consortium (W3C)
>
> _______________________________________________
> Speakup mailing list
> Speakup at braille.uwo.ca
> http://speech.braille.uwo.ca/mailman/listinfo/speakup
>