Willem van der Walt wrote: > The different ocr engines require different image formats. > Some of them are really dum. They probably derive from old code written without a format-independent graphics library. > I find that the best of the open-source engines is cuneiform. Aha, interesting. I've always used tesseract. cuneiform is in debian wheezy (testing) but not yet in debian stable... Depending on how the PDF was produced, it's possible that ps2txt filename.pdf (a.k.a. ps2ascii) might help; I think it comes with ghostscript. Regards, Peter Billam http://www.pjb.com.au pj at pjb.com.au (03) 6278 9410 "Was der Meister nicht kann, verm?cht es der Knabe, h?tt er ihm immer gehorcht?" Siegfried to Mime, from Act 1 Scene 2