I'm mentioning it just because nobody has so far. Elastic Search[1] which is also lucene-based, was designed from the very beginning to be distributed (in contrast to solr). The product hasn't reached the symbolic 1.0 yet but is production-ready (for instance github[2] uses it). Dridi [1] http://www.elasticsearch.org/ [2] https://github.com/blog/1381-a-whole-new-code-search On Sat, Nov 2, 2013 at 7:02 AM, Frankie Onuonga <frankie.onuonga@xxxxxxxxx> wrote: > Hi folks, > > I trust all is well. > > I believe this email will spark something so I will cc Kevin in it because > of multiple reasons. > I would like somethings to be clear from the word go. > > I am not too sure where to start with this email. > I have combined emotions of extremely mad and extremely excited at the same > time . > First I would like to thank all those who have been kind enough to offer > their assistance. > I would also like to say thank you to those that have started brainstorming. > I am sure in sometime we will be able to see the fruits of our labor. > > Now in regards to the reason why I am least amused today. I am going to be > straight to the point and clear with this. I do not appreciate a user who is > here to critic and offer no solution. I generally follow open source ethics > but if your job is to come in and critic with a lot of rubbish opinions > (yes I am referring directly to whoever posted that this is not something to > look into and even worse insist on it) then please don't waste your time. > Keep off this thread. > > It does not amuse me to the slightest bit when criticism is given with no > solution. I understand when someone makes a mistake. I also understand when > someone has a valid point. > I do not understand when you give an opinion with a solution being you will > never use the service. > You can not rate something before use. > I would also advice you have a look at the mailing list guidelines so that > you are up to speed. > > > The best of minds are probably here with us, people do not mention who they > work for but trust me they are here. Fine we admit google is miles ahead. I > personally know they took time to get there. > > I also have read their papers and there are open source solutions that have > been mentioned earlier (Apache lucene/sol) that try and mimic this. Seeing > it is for our use , which in my opinion is small I think it is a great > start. > > Third, free and open source is all that is used here. simple. > > I would therefore proceed to mention, if you are not contributing in a > positive way, be kind to the world. We do not have super cow abilities. > > > Kind Regards, > > Onuonga Frankie > > > On Sat, Nov 2, 2013 at 4:57 AM, Alek Paunov <alex@xxxxxxxxxxx> wrote: >> >> On 02.11.2013 02:32, Michael Cronenworth wrote: >>> >>> This will be my last mailing on this topic as I will not contribute or >>> use this feature in Fedora, but this reply warranted clarification. >>> >>> On 11/01/2013 06:14 PM, Alek Paunov wrote: >>>> >>>> Another simple answer: CSE is a low quality search - no facets, no >>>> (real) >>>> content age restriction. The same is valid also for every other >>>> service/application which is solely based on generic web pages crawling. >>> >>> >>> CSE is as full blown as a Google Appliance. More advanced than anything >>> you can write in Perl/Python/Ruby in a month. Site restrictions, keyword >>> restrictions, (real) age restrictions, autocomplete help, synonyms, >>> image search, all of which are provided through a XML API.[1] >>> >> >> Indeed. Don't get me wrong - I like CSE service for what it is good for. >> It seems that I had not been clear enough with my English - Sorry! >> >> Nobody is able to write a good, modern index in a month - lucene/solr, >> xapian, etc, are all evolved in long, long years. Our task is a proper >> deployment of one or combination of them, not inventing a new. >> >> Why e.g. solr instead of CSE or dpsearch (which is opensource, and also >> mentioned in the old tickets)? >> >> Granularity: With CSE/dpsearch the indexed content unit is a crawled and >> automatically processed Web document (I say Web document instead of HTML >> page, because CSE handles many types). Not single BZ comment. Not change >> comment in a spec file. Not Git commit. Or in the reverse direction: Email, >> not thread (because we do not yet have yet archive page displaying the whole >> thread). I.e. there are no concept of document and subdocuments (in which >> most of our content belongs). >> >> Attributes: You can not attach custom scalar/category attributes (the base >> of the faceted search) to the FTS indexed units. >> >> Please correct me if I am wrong about CSE with some of the above. >> >> Fedora has datasources (bugs, wikis, mails, packages, docs, etc,) not just >> sitemaps/pages, and they all talk about same things (common topic >> hierarchies, common tag hierarchies, common authors). They form highly >> interlinked virtual knowledge base. >> >> We should start index the sources in their native structure now, to be >> able to upgrade some happy day to full blown semantic search (when >> available), which is actually what we badly need. >> >> >>>> In our case, we are the owners of the content, we know how it is >>>> structured, we >>>> know where are the feeds with the pure content changes, we can >>>> explicitly feed >>>> the indexes with all named attributes of the content nodes and later >>>> use them. >>> >>> >>> But you don't know how other people on the web find and link to Fedora >>> pages to provide accurate page ranking. >>> >> >> Personas: 1. Active Fedora contributor, 2. Fedora contributor, 3. Power >> Fedora user/sysadmin, 4. Fedora user, 5. Potential Fedora user, 6. IT >> journalist. >> >> IMHO, at least for 1-3 the results ordering by recursive link-rank >> valuation (Google page ranking) is more an issue than an advantage. >> >> For 4 (also important) the relevant sets are probably: the docs, part of >> wiki, ask.fp.o and might be users@. I don't know - not always stackoverflow >> 'relevance' top resuls on a set of keywords are the same as google with >> site:stackoverflow.com in the query ... >> >> For 5-6 Google page ranking is probably the best, but they will use Google >> instead of search.fp.o anyway (at least initially, latter their more >> concrete queries would be more like 3-4 ones). >> >> Kind Regards, >> Alek >> >> >> -- >> devel mailing list >> devel@xxxxxxxxxxxxxxxxxxxxxxx >> https://admin.fedoraproject.org/mailman/listinfo/devel >> Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct > > > > > -- > Skype: Frankie.Onuonga > twitter: Frankieonuonga > irc #freenode: Frankie.onuonga > > -- > devel mailing list > devel@xxxxxxxxxxxxxxxxxxxxxxx > https://admin.fedoraproject.org/mailman/listinfo/devel > Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct -- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct