Search Engine project

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



For some time, we've wanted to explore search engine options, so that you could go to one site and search all of the Fedora sites that we run.
Examples include: Wiki (wiki search is not that great), docs.fp.o, pkgdb, etc.

Relative information from last time this was discussed:
https://fedorahosted.org/fedora-infrastructure/ticket/1055
http://fedoraproject.org/wiki/Infrastructure/Search


I've been playing with various options on one of our junkXX boxes, and seeing what works well.

- I tried Sphinx, but it seems this is really just a database fulltext search, not a full-out search engine and crawler solution.
- I tried Xapian, but getting it crawling required a lot of hacking and conversion from an external crawler (e.g. htdig), and htdig kept throwing traces and dying, on https sites.
- I tried mnoGoSearch, its CGI would not work at all. It would simply timeout when I tried to go to it.

- I lastly tried Datapark Search, which seems like our best bet:
    - I ran into an issue where randomly the crawler would throw traces about libcrypto. I reported the issue upstream and they released a snapshot release two days later that seems to have fixed the issue. So upstream is active.
    - I played with some styling ideas, and tried to incorporate search results into the standard Fedorahosted/people/wiki template. Needs some work to finish this, but it's getting there.
    - The default CGI template had horrid HTML, but I worked with that and got it reasonable (going to finish it up today or tomorrow and try to get it passing as valid html 5).

But out of the options I tried, this seems like the best one available. It is a fork of mnoGoSearch. It has a lot of options to customize it, and shape it into what we want it to do.

That said, I am more than open to trying other options before we decide to move forward with Datapark. If nobody screams over the next few days, I will work on moving forward. We need to package it, and it looks like we'll have to package the snapshot version.

Anyway I am just throwing this out to update everyone on my findings, and see if anyone has ideas for other options.

-re
_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure

[Index of Archives]     [Fedora Development]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux