Search Postgresql Archives

Re: Indexing MS/Open Office and PDF documents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 15/03/12 21:12, Jeff Davis wrote:
On Fri, 2012-03-16 at 01:57 +0530, Alexander.Bagerman@xxxxxxxxxxxxx

We have
hard time identifying MS/Open Office and PDF parsers to index stored
documents and make them available for text searching.

The first step is to find a library that can parse such documents, or
convert them to a format that can be parsed.

I've used docx2txt and pdf2txt and friends to produce text files that I then index during the import process. An external script runs the whole process. All I cared about was extracting raw text though, this does nothing to identify headings etc.

--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux