--- Paul <subsolar@xxxxxxxxxxxx> wrote: > On Tue, 2006-04-11 at 06:55 -0700, Mike Stankovic > wrote: > > I've got about 10,000 docs I'd like to devise a > > search/index for. I found a perl script called > > Perlfect that can do that on an old P3 but at the > > astronomical time of 7 hours. Another > script(cgi/perl) > > at hotscripts can do the same but allows the "rm > -rf > > /" exploit. DoH!? > > > > Is there anything perl/flatfile that can > search/index > > faster? This is a nice job for an aging P3 in > the > > corner so php/MySQL is not an option. Don't > suggest > > beagle/windows solutions as this is a CentOS 4.3 > system. > > Well at work we have an archive of ~ 12K PDFs that > engineering uses for > process documentations and I use Swish-e > (http://swish-e.org/) to index > it so that they can search it. The server it sits > on is a PIII 733 with > 512MB RAM and it takes about 90 minutes to re-index > them every night. > > It works well for us as it allows AND & OR > operators, searches for > phrases and other fairly advanced features. > > The main limitation is that you need a filter to > convert whatever the > document is to one of the following: text, html or > xml so it can be > indexed. > > Regards, > Paul Berger > > > __________________________________________________ > > Improve the mailing list by performing a simple > search > > before posting and reading the faq/etiquette. > > Thank you!! > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > _______________________________________________ > > CentOS mailing list > > CentOS@xxxxxxxxxx > > http://lists.centos.org/mailman/listinfo/centos > > > > _______________________________________________ > CentOS mailing list > CentOS@xxxxxxxxxx > http://lists.centos.org/mailman/listinfo/centos > Yes Swish-e is in dag's repo and appears to be supported upstream very well. I was right about htsearch it is one of the components of htdig (also available in rpm format). Does it have issues with charsets that are not Latin-1 (ISO-8859-1) or plain 7bit ASCII ? __________________________________________________ Improve the mailing list by performing a simple search before posting and reading the faq/etiquette. Thank you!! __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com