Re: storing & searching docs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/13/2012 02:49 PM, Jim Giner wrote:
Thanks for all the posts. After reading and googling all afternoon, I
think the best approach for me is:

Create two macros in Word (done!) to export each of my .doc files to
.txt and .pdf formats.

Create a sql table to hold the .txt contents of my .doc files, along
with a reference to the meeting date and the name of the corresponding
.pdf file.

Upload my two sets of files with an ftp client and then use a script to
load the table with my .txt file data.

Now I just need a couple of scripts to allow a user to locate a file and
bring up the pdf for when he wants to read about a meeting. And a second
script to accept user input (search words) and perform a query against
the textual data and present some kind of results - probably a listing
containing a reference to the meeting date and a tbd-length string
showing the matching result for each occurrence, ie, something like n
chars in front of and after the match so the user can see the context of
the match.

Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in
.txt format. (actually, if I 'print' the .doc as a pdf instead of using
the Word's "File,Save as", the resulting pdf is only 70kb. Might need a
new macro!)

Thanks again!


I wrote this script a few years ago that extracted the plain text out of the .doc file.

http://www.cmsws.com/examples/applications/word2_/convert.php

if you look in the directory you will see a few example files.

You can view them like this.

.../convert.php?filename=test_building.doc

replace test_building.doc with any of the other .doc files from the dir listing to see its contents.

I currently have it set to 64bit width rows. Show you some nice pattern stuff with the MS Word format.

I have the source file viewable for the convert.php script as well.

http://www.cmsws.com/examples/applications/word2_/convert.phps

I have thought about extending this even further to figure out the layout and test formatting. But it hasn't gotten much attention for quite some time now.

Hope it helps.

--
Jim Lucas

http://www.cmsws.com/
http://www.cmsws.com/examples/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux