Re: PDF to text?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Aug 13, 2011 at 2:41 AM, Bob Goodwin <bobgoodwin@xxxxxxxxxxxx> wrote:
> On 12/08/11 12:22, mike cloaked wrote:
[...]
>> However if the pdf is a scanned image then it would need ocr before
>> the text could be extracted -

As someone else noted, some recent scan-to-pdf tools try to pre-ocr
the text. Sometimes it's sort of helpful. Sometimes not so much.

Some pdf output tools actually bury the real text into the pdf as well
as an image of the text. But that's not scanning. This doesn't seem to
be the case, either.

>        I believe it is a scanned image now that I realize it has a
>        handwritten signature.
>
>        Xsane does ocr. I tried scanning a printed copy and letting
>        xsane save it as a text message as well as trying gocr to read
>        an xsane .pnm file. Both produced the same output which looks
>        like it would require a lot of work to be usable if it is
>        possible at all?
>
>        I will do without the Google translation.
>
>        Thanks for all the suggestions. This has been interesting, I
>        always wondered about ocr, what it could do. I need to
>        experiment with a document in English so that I have something I
>        understand however it looks like the output quality is poor?

ocr is still hit-and-miss. Some combinations of
languages/fonts/scanners/image format/paper quality/ocr software and
the price of 10base5 cable on Saipan work well. Others don't.

Well, probably not 10base5. :/

But the tuning is sometimes so time-intensive that you'd prefer to
just type it in by hand. On the other hand, if you have a lot of the
scanned text that comes from the same source, the tuning can be worth
it.

Don't ask me how to tune the ocr. Some years ago I read up on it and
decided, for that doc, I'd pass. Open source ocr seems to have
progressed since then, which is nice.

Joel Rees
-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines


[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux