On the topic of OCR, you could also check out specialized Optical Music Recognition systems (http://en.wikipedia.org/wiki/Optical_music_recognition). In the university department where I wrote my dimploma thesis gamera (http://gamera.informatik.hsnr.de/) was used for this task. Cheers, Daniel 2010/4/3 Luke Peterson <luke.peterson@xxxxxxxxx>: > PDFsam -- PDF Split-And-Merge is a handy open-source tool. > (http://www.pdfsam.org/) > > But its title is its featureset, for the most part. It allows you to reorder > PDFs, pull pages out, add pages in, rotate pages 90, 180, 270 degrees, etc. > Command-line driven but there's also a gui console. > > It's got a windows installer, but should run anywhere Java is available. > > Sounds like on top of the scanning and organizing solution, you need to > figure out some OCR application to extract metadata from each of the PDFs in > a large-scale way. > > If you're planning to put these out for public consumption, you can use > Google to assist you in your scanning and indexing: > > http://www.labnol.org/software/convert-scanned-pdf-images-to-text-with-google-ocr/5158/ > > Alternatively, the open-source OCR world is getting better fast. Check out > OCRopus (http://en.wikipedia.org/wiki/OCRopus) -- it's a linux-based > command-line OCR tool. You should be able to incorporate this into a > workflow, it'll spit out what it thinks your PDF says in htmlish (specified > here: http://docs.google.com/View?docid=dfxcv4vc_67g844kf). > > I could see a workflow on your end that creates four rotations of each page > scanned, then attempts to OCR them in each degree of rotation with OCRopus, > compares the results, and persists in your datastore the one with the > highest combination of recognized characters and recognition score. I > suppose this is only really helpful if a) your PDFs often get scanned > upside-down or sideways, and b) all your PDFs have some amount of digital > typography on them. > > Anyway, a couple ideas. > > ----- > Luke Peterson > > _______________________________________________ Linux-audio-user mailing list Linux-audio-user@xxxxxxxxxxxxxxxxxxxx http://lists.linuxaudio.org/listinfo/linux-audio-user