Steve, You must turn the file to postscript before you can read anything out of it, and even then, a lot of the time words are broken up into different "show" statements. use pdf2ps (part of GNU GhostScript) to convert to PS and then search for patterns like this: (text) show that is the most basic postscript syntax but often it is more complex than that. The Adobe postscript 3 driver for windows creates all in one lines with positioning and formatting parameters. So basically, you look for anything in parenthesis to be text. For indexing you may get by just looking for anything in parenthesis, but I would look for a third party utility to do it if you want it done perfectly. -Jeff Moss ----- Original Message ----- From: <veditio@xxxxxxxxxxx> To: <php-db@xxxxxxxxxxxxx> Sent: Friday, June 25, 2004 1:36 PM Subject: Read a PDF file via PHP > I'm working on a file upload system that accepts PDF files, reads the text in those files, and enters it into a database, which makes the text from the PDF indexable and searchable. > > I've got it all down except for the ability to read the text from a PDF file. > > When the PHP read file function is used, the PDF file is read just fine, but when I return the results to my browser, they're of course nothing but jumble, because the code for the entire file was read. > > Is there any way to get PHP to simply read the PDF file for text only--just the surface of it, just the words, as if it were a human reading the PDF itself--and not for the internal code of the file? > > Thanks, > Steve > > -- > PHP Database Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > -- PHP Database Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php