Re: OT: .doc,.xls,.pdf,.ppt (etc.) string parser/indexers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



Greetings,

On Fri, Aug 28, 2009 at 10:50 PM, Les Mikesell<lesmikesell@xxxxxxxxx> wrote:
> Does anyone have experience with linux tools to parse the text from
> common non-text file formats for searching?  I'm trying to use the
> kinosearch add-on for twiki which is fine as far as the search goes, but
> it takes forever to generate the index.

I am not sure this answers your query to the point.

But I have seen Lucene .net SDK (With extensions to scour .doc, .odt,
.pdf etc.) to very good effect and pretty decent performance.

HTH

Thanks and Regards

Rajagopal
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos


[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux