On Dec 15, 2012 7:29 AM, "tamouse mailing lists" <tamouse.lists@xxxxxxxxx> wrote: > > > On Dec 13, 2012 4:50 PM, "Jim Giner" <jim.giner@xxxxxxxxxxxxxxxxxx> wrote: > > > > Thanks for all the posts. After reading and googling all afternoon, I think the best approach for me is: > > > > Create two macros in Word (done!) to export each of my .doc files to .txt and .pdf formats. > > > > Create a sql table to hold the .txt contents of my .doc files, along with a reference to the meeting date and the name of the corresponding .pdf file. > > > > Upload my two sets of files with an ftp client and then use a script to load the table with my .txt file data. > > > > Now I just need a couple of scripts to allow a user to locate a file and bring up the pdf for when he wants to read about a meeting. And a second script to accept user input (search words) and perform a query against the textual data and present some kind of results - probably a listing containing a reference to the meeting date and a tbd-length string showing the matching result for each occurrence, ie, something like n chars in front of and after the match so the user can see the context of the match. > > > > Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in .txt format. (actually, if I 'print' the .doc as a pdf instead of using the Word's "File,Save as", the resulting pdf is only 70kb. Might need a new macro!) > > > > PDF might be better looking than this, but how big is an HTML doc exported from Word? Sorry for the disjointed replies, it's still early... You could export just the HTML, upload it, and your script could strip the HTML to have both formats available, I.e. plain text for indexing, HTML for presentation... or even, say, run the HTML through pandoc and produce markdown... As I say, it's early, these might be bad ideas, but it's how I'd approach it. > > > Thanks again! > > > > > > -- > > PHP General Mailing List (http://www.php.net/) > > To unsubscribe, visit: http://www.php.net/unsub.php > >