Re: SCanning text of PDF documents

Robert Cummings <robert@xxxxxxxxxxxxx> · Thu, 15 May 2008 21:33:12 -0400

On Thu, 2008-05-15 at 20:17 -0500, Ray Hauge wrote:
>
> One thing you'll have to watch is that if the PDF was created by a 
> scanner, then the "text" on the PDF is actually just an image and cannot 
> be read without OCR.  I got stumped on that one for a while when I was 
> doing something similar :)

I love the tables where you have something like the following:

    .-----------------.----------------------.
    | This is a short | This is a different  |
    | paragraph about | piece of content     |
    | something in a  | about another thing. |
    | table           |                      |
    `-----------------^----------------------'

And of course when you cut and paste you get the following:

    This is a short This is a different
    paragraph about piece of content
    something in a about another thing.
    table

Oh yes, that's what I expected too. It's not even something you can
clean with a macro. You have carefully piece them back together, or
copy/paste one line at a time-- or just type it :)

Cheers,
Rob.
-- 
http://www.interjinn.com
Application and Templating Framework for PHP

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php