Well, I was goiing to say that too -- that the problem with tesseract
probably wasn't the quality of the OCR itself but the orientation and
cropping. But that's small consolation to someone who is used to having
their software deal with all that on it's own. There probably is
something out there in open-source land that does the rotation, at
least. If 20% of the words in the text aren't in the dictionary, rotate
the image and try again. Something like that would be easy enough to
write. But if there is something like that available as open source, I
am unaware of it.
On 06/27/13 17:44, aw585@xxxxxxxx wrote:
As Mr. Hart is well aware, tesseract works good enough that
it probably solved better than 90% of the captcha's for
slimrat-nox when this was needed for downloading from Rapidshare.
What is probably needed is to rotate and crop the output
images from SANE's scanimage using Imagemagick / convert
or pnm tools, when scanning anything other that a sheet of paper
oriented in portrait format.
In fact, if you look at the tesseract man page, it lists
the 'convert' program in the 'SEE ALSO' section.
Regards,
Dallas E. Legan II
legan@xxxxxxx / aw585@xxxxxxxx /
http://www.lafn.org/~aw585/index.html
---------------------------------------------
This message was sent using Endymion MailMan.
http://www.endymion.com/products/mailman/
_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list
--
---
John G. Heim, 608-263-4189, jheim@xxxxxxxxxxxxx
_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list