Re: reading PDF's

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, June 24, 2005 12:10 pm, Jon said:
> Is it possible to read text from a PDF file with PHP? How?

At the crudest level, you can fopen/fread a PDF and dump it out, and pick
out the plain-text readable bits with your eyes. :-)

After that, there are definitely some commercial command-line tools to
convert PDF to text (or HTML or whatever) that you can Google for.

There may be a free one, or even an OpenSource one, but I've never heard
of it, possibly because they'd have to pay a license to Adobe (Macromedia
this week?) to be legal...

Note that PDFs can have the text encrypted, or password-protect the PDF,
or the text could have been rendered into an image which was embedded in
the PDF (ugh!).

At that point, you can maybe get the image out and use some kind of OCR
softare like OmniPage to "read" it.

Over the years and versions the PDF changed a lot, so be sure to have a
representative sample of PDFs to throw at your testing.

You don't want to get to launch and find out 90% of the real PDFs simply
don't work. :-(

-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux