Examples include: Wiki (wiki search is not that great), docs.fp.o, pkgdb, etc.
Relative information from last time this was discussed:
https://fedorahosted.org/fedora-infrastructure/ticket/1055
http://fedoraproject.org/wiki/Infrastructure/Search
I've been playing with various options on one of our junkXX boxes, and seeing what works well.
- I tried Sphinx, but it seems this is really just a database fulltext search, not a full-out search engine and crawler solution.
- I tried Xapian, but getting it crawling required a lot of hacking and conversion from an external crawler (e.g. htdig), and htdig kept throwing traces and dying, on https sites.
- I tried mnoGoSearch, its CGI would not work at all. It would simply timeout when I tried to go to it.
- I lastly tried Datapark Search, which seems like our best bet:
- I ran into an issue where randomly the crawler would throw traces about libcrypto. I reported the issue upstream and they released a snapshot release two days later that seems to have fixed the issue. So upstream is active.
- I played with some styling ideas, and tried to incorporate search results into the standard Fedorahosted/people/wiki template. Needs some work to finish this, but it's getting there.
- The default CGI template had horrid HTML, but I worked with that and got it reasonable (going to finish it up today or tomorrow and try to get it passing as valid html 5).
But out of the options I tried, this seems like the best one available. It is a fork of mnoGoSearch. It has a lot of options to customize it, and shape it into what we want it to do.
That said, I am more than open to trying other options before we decide to move forward with Datapark. If nobody screams over the next few days, I will work on moving forward. We need to package it, and it looks like we'll have to package the snapshot version.
Anyway I am just throwing this out to update everyone on my findings, and see if anyone has ideas for other options.
-re
_______________________________________________ infrastructure mailing list infrastructure@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/infrastructure