Re: SCanning text of PDF documents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



A reliable solution depends partly on the pdf document itself. Consider if your pdf document contains roted text or text that spans about several different blocks/pages. My experience with ps2acsii and other ghostscript related tools is that sometimes it works quite well, sometimes the output is rather messy.

The most reliable way of extracting text from a pdf is (I think) a product called PDF TET from PDFlib Gmbh. Yes, it costs some money for a license, but you are able to get almost everything out of the pdf then.

http://www.pdflib.com/products/tet/

Maybe some magic with OpenOffice could do the trick as well?

//frank

15 maj 2008 kl. 10.19 skrev Angelo Zanetti:

Hi All.

This is a quick question.

A client of ours wants a solution that when a PDF document is uploaded that
we use PHP to scan the documents contents and save it in a DB.

I know you can do this with normal text documents using the file commands
and functions.

Is it possible with PDF documents?

My feeling is NO, but perhaps someone will prove me wrong.

Thanks in advance.

Angelo

Web: http://www.elemental.co.za



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php





Frank Arensmeier
........................................................................ ........................
Webmaster & IT Development

NIKE Hydraulics AB
Box 1107
631 80 Eskilstuna
Sweden

phone +46 - (0)16 16 82 34
fax +46 - (0)16 13 93 16
frank@xxxxxxxxxxxx
www.nikehydraulics.se
........................................................................ ........................




[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux