On Sat, Dec 20, 2008 at 2:11 PM, Daniel Dalton <d.dalton@xxxxxxxxxxxx> wrote: > That's a good idea, I didn't think of that, I guess I should invest > some time into writing something like this. I did some checking and it sounds like Ocropus already does a lot of the kinds of things I discussed and is under active development. So you might study Ocropus further before deciding whether to develop a script. Your time might be more productively spent contributing to that project. Here are some links that may assist: Ocropus Project home page: <http://code.google.com/p/ocropus/>. Ocropus Wiki: <http://sites.google.com/site/ocropus/>. Ocropus mailing list/forum: <http://groups.google.com/group/ocropus>. Ocropus documentation: <http://sites.google.com/site/ocropus/documentation> Ocropus development road map: <http://code.google.com/p/ocropus/wiki/Roadmap>. Updated road map: <http://sites.google.com/site/ocropus/roadmap> (extends roadmap beyond milestones identified in the first roadmap). I checked because the major OCR apps on Windows have for many years provided tools for this kind of stuff. Therefore, I thought it likely that someone was already developing an open source solution. Along the way, I learned that Ocropus includes disabled code for handwriting recognition that may be repaired later. Google is generously funding both Ocropus and Teseract development, with Teseract being developed right now mainly for book conversions, in aid of Google's Books initiative. I hope this helps. OCR is one of those areas where free and open source developers are still catching up with proprietary software. The bright side of that situation is that there should be a lot of progress made fairly quickly because the technology is well understood. Best regards, Paul -- Universal Interoperability Council <http:www.universal-interop-council.org> _______________________________________________ Blinux-list mailing list Blinux-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/blinux-list