Re: SCanning text of PDF documents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2008-05-15 at 20:17 -0500, Ray Hauge wrote:
>
> One thing you'll have to watch is that if the PDF was created by a 
> scanner, then the "text" on the PDF is actually just an image and cannot 
> be read without OCR.  I got stumped on that one for a while when I was 
> doing something similar :)

I love the tables where you have something like the following:

    .-----------------.----------------------.
    | This is a short | This is a different  |
    | paragraph about | piece of content     |
    | something in a  | about another thing. |
    | table           |                      |
    `-----------------^----------------------'

And of course when you cut and paste you get the following:

    This is a short This is a different
    paragraph about piece of content
    something in a about another thing.
    table

Oh yes, that's what I expected too. It's not even something you can
clean with a macro. You have carefully piece them back together, or
copy/paste one line at a time-- or just type it :)

Cheers,
Rob.
-- 
http://www.interjinn.com
Application and Templating Framework for PHP


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux