Re: storing & searching docs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I think im good with a text for the db and search capability and the pdf for pure display

jg


On Dec 15, 2012, at 5:31 PM, tamouse mailing lists <tamouse.lists@xxxxxxxxx> wrote:

> On Sat, Dec 15, 2012 at 11:22 AM, Jim Giner
> <jim.giner@xxxxxxxxxxxxxxxxxx> wrote:
>> On 12/15/2012 8:29 AM, tamouse mailing lists wrote:
>>> 
>>> On Dec 13, 2012 4:50 PM, "Jim Giner" <jim.giner@xxxxxxxxxxxxxxxxxx> wrote:
>>>> 
>>>> 
>>>> Thanks for all the posts.  After reading and googling all afternoon, I
>>> 
>>> think the best approach for me is:
>>>> 
>>>> 
>>>> Create two macros in Word (done!) to export each of my .doc files to .txt
>>> 
>>> and .pdf formats.
>>>> 
>>>> 
>>>> Create a sql table to hold the .txt contents of my .doc files, along with
>>> 
>>> a reference to the meeting date and the name of the corresponding .pdf
>>> file.
>>>> 
>>>> 
>>>> Upload my two sets of files with an ftp client and then use a script to
>>> 
>>> load the table with my .txt file data.
>>>> 
>>>> 
>>>> Now I just need a couple of scripts to allow a user to locate a file and
>>> 
>>> bring up the pdf for when he wants to read about a meeting.  And a second
>>> script to accept user input (search words) and perform a query against the
>>> textual data and present some kind of results - probably a listing
>>> containing a reference to the meeting date and a tbd-length string showing
>>> the matching result for each occurrence, ie, something like n chars in
>>> front of and after the match so the user can see the context of the match.
>>>> 
>>>> 
>>>> Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in
>>> 
>>> .txt format.  (actually, if I 'print' the .doc as a pdf instead of using
>>> the Word's "File,Save as", the resulting pdf is only 70kb.  Might need a
>>> new macro!)
>>>> 
>>>> 
>>> 
>>> PDF might be better looking than this, but how big is an HTML doc exported
>>> from Word?
>>> 
>>>> Thanks again!
>>>> 
>>>> 
>>>> --
>>>> PHP General Mailing List (http://www.php.net/)
>>>> To unsubscribe, visit: http://www.php.net/unsub.php
>>>> 
>>> 
>> Word generates very many many words (!) when creating an html doc.  Not a
>> good html generator at all.
>> 
>> 
>> --
>> PHP General Mailing List (http://www.php.net/)
>> To unsubscribe, visit: http://www.php.net/unsub.php
>> 
> 
> I think my next email talked about sending the HTML through pandoc to
> make a plain text file, perhaps in markdown, which could be the thing
> you save, and then run it through a markdown filter to produce (a
> much, much leaner) HTML.
> 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux