A reliable solution depends partly on the pdf document itself.
Consider if your pdf document contains roted text or text that spans
about several different blocks/pages. My experience with ps2acsii and
other ghostscript related tools is that sometimes it works quite
well, sometimes the output is rather messy.
The most reliable way of extracting text from a pdf is (I think) a
product called PDF TET from PDFlib Gmbh. Yes, it costs some money for
a license, but you are able to get almost everything out of the pdf
then.
http://www.pdflib.com/products/tet/
Maybe some magic with OpenOffice could do the trick as well?
//frank
15 maj 2008 kl. 10.19 skrev Angelo Zanetti:
Hi All.
This is a quick question.
A client of ours wants a solution that when a PDF document is
uploaded that
we use PHP to scan the documents contents and save it in a DB.
I know you can do this with normal text documents using the file
commands
and functions.
Is it possible with PDF documents?
My feeling is NO, but perhaps someone will prove me wrong.
Thanks in advance.
Angelo
Web: http://www.elemental.co.za
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Frank Arensmeier
........................................................................
........................
Webmaster & IT Development
NIKE Hydraulics AB
Box 1107
631 80 Eskilstuna
Sweden
phone +46 - (0)16 16 82 34
fax +46 - (0)16 13 93 16
frank@xxxxxxxxxxxx
www.nikehydraulics.se
........................................................................
........................