Re: what software used for ocr on linux

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
I have written a front-end for doing OCR in the kies package.
It handles scanning through scanimage and then OCR through any of a number of engines.
It produces an HTML file with all the pages and links to each one.
It also can do OCR from an image file.
The engines are called through wrapper scripts which are defined in a text file. If anyone wants it, the kies package currently can be found at: http://hlt.mirror.ac.za/TTS/sanote/kies-latest.tar.bz2
As root:
tar jxv f kies-latest.tar.bz2
cd kies
./install.sh
The actual program for the scan stuf is called kies_p2t
HTH, Willem


On Thu, 10 Jul 2014, Tony Baechler wrote:

You don't need a graphical environment for sane, but you do for xsane.  I've
confirmed that sane will let you scan from the command line.  It's been a
few years, so I don't remember the exact process, but I think you might need
a sane-utils package.  The problem I had is that it put each page in a
single .tif image which I couldn't get to OCR very well and couldn't easily
find a way to combine into a single file for more convenient and faster
processing.  I played around with "convert" from ImageMagick but still
didn't get very far, so I unfortunately went back to K1000 in Windows.
Again, it's been a few years and I only played with Tesseract, so you might
get better results nowadays.  If you do install Tesseract, be prepared for a
lot of dependencies as it's very big.  Many newer scanners don't have sane
drivers and won't be detected as they're designed to work in Windows, so
don't be surprised if your scanner doesn't appear to work.  I got lucky in
that it found the scanner automatically and mostly just worked for me.

On 2014-07-10 04:35 AM, Doug Smith wrote:


First of all, install all these so that you will have a choice:

ocrad, tesseract, gocr and cuneiform

These are the actual ocr engines which do the text recognition in the first place.

After this, if you are in a graphical environment install ocrfeeder which is the basic open-book framework.  Also make sure you have sane in if you
are using a scanner.  Make sure sane can recognize your scanner.

Now, you are ready to try it.  I have no scanner so I have never used this but it might just do the trick for you.



Hope this helps.





--
Have a good day,
Tony Baechler
tony@xxxxxxxxxxxx

_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list

--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean.

Please consider the environment before printing this email.



_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list




[Index of Archives]     [Linux Speakup]     [Fedora]     [Linux Kernel]     [Yosemite News]     [Big List of Linux Books]