Re: storing & searching docs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Dec 15, 2012 7:29 AM, "tamouse mailing lists" <tamouse.lists@xxxxxxxxx>
wrote:
>
>
> On Dec 13, 2012 4:50 PM, "Jim Giner" <jim.giner@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > Thanks for all the posts.  After reading and googling all afternoon, I
think the best approach for me is:
> >
> > Create two macros in Word (done!) to export each of my .doc files to
.txt and .pdf formats.
> >
> > Create a sql table to hold the .txt contents of my .doc files, along
with a reference to the meeting date and the name of the corresponding .pdf
file.
> >
> > Upload my two sets of files with an ftp client and then use a script to
load the table with my .txt file data.
> >
> > Now I just need a couple of scripts to allow a user to locate a file
and bring up the pdf for when he wants to read about a meeting.  And a
second script to accept user input (search words) and perform a query
against the textual data and present some kind of results - probably a
listing containing a reference to the meeting date and a tbd-length string
showing the matching result for each occurrence, ie, something like n chars
in front of and after the match so the user can see the context of the
match.
> >
> > Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb
in .txt format.  (actually, if I 'print' the .doc as a pdf instead of using
the Word's "File,Save as", the resulting pdf is only 70kb.  Might need a
new macro!)
> >
>
> PDF might be better looking than this, but how big is an HTML doc
exported from Word?

Sorry for the disjointed replies,  it's still early...

You could export just the HTML, upload it, and your script could strip the
HTML to have both formats available, I.e. plain text for indexing, HTML for
presentation... or even, say, run the HTML through pandoc and produce
markdown...

As I say, it's early, these might be bad ideas, but it's how I'd approach
it.

>
> > Thanks again!
> >
> >
> > --
> > PHP General Mailing List (http://www.php.net/)
> > To unsubscribe, visit: http://www.php.net/unsub.php
> >

[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux