Fair question. Not worried so much about speed. Looking, essentially, at precision by rank (i.e., average precision and variants). I have not explored the contrasts between the default English language configuration in Postgres and the one in Solr - I have no reason to believe that there's anything odd going on there. My problem is that I can't provide specific performance numbers, or the corpus in question, but my overall impression is that the top N (10, 20) results from Postgres, no matter how I configure the ranking, aren't as relevant to the query, as a group, than the ones from Solr.
Example anecdote: the documents I'm searching come with metadata (e.g., title), which I'm not indexing specially (not a separate field, just part of the raw text of the document). When I search even for single terms, and look at the titles of the results, the titles in the Solr results more frequently contain that term than the titles in the Postgres results. I also FEEL like I've noticed that the problem is more apparent in "OR" queries; if I search for a disjunction of terms, the documents that contain all the terms are more likely to be high in the Solr rankings than in the Postgres rankings.
I apologize for not being able to be more specific.
Thanks in advance, again.
On 3/4/22 10:30 AM, Atri Sharma wrote:
Can you define what "high quality" is?
Are you referring to precision? Or recall? Or speed? Or query dialect?
On Fri, Mar 4, 2022 at 8:59 PM Bayer, Samuel <sam@xxxxxxxxx> wrote:
Thanks for replying. My problem is that I can't provide enough guidance on what isn't working, because (a) I don't have good enough intuitions about how the normalization options are expected to affect the results, and (b) I can't identify a specific missing function - I'm just observing that I can't make the results as high-quality as Solr.
My apologies.
Sam
On 3/4/22 10:25 AM, Bruce Momjian wrote:
On Fri, Mar 4, 2022 at 08:10:48AM -0500, Bayer, Samuel wrote:
Hi all -
When I have a need for both sophisticated database querying and
full-text search, I'd rather not stand up a technology stack with
multiple tools (e.g., Postgres and Apache Solr, or Postgres and
ElasticSearch with a zomboDB bridge). So I've been looking at the
Postgres full-text search capability, and comparing it to Apache
Solr. My experience so far - which has not been entirely anecdotal,
but hasn't amounted to a formal TREC-style evaluation - is that
Postgres full-text search, in any ranking/normalization configuration
I can create, is reliably worse than Solr. Now, I understand that the
whole point of Solr is search, and this is a sideline for Postgres,
but I'd like to figure out how close Postgres can get, and while I'm
knowledgeable about search technologies, I'm not an expert. And I've
looked for information on the Web about comparing Postgres search
to other search capabilities, and everything I've found so far is
extremely basic.
Does anybody have any pointers to resources (people, sites, journal
articles, blogs, etc.) which are deeply knowledgeable about this
comparison?
Uh, most of our full text seach is done by Russian developers, who are
obviously very good at it. It would be helpful if you could list
exactly what is missing and then we can have a discussion the hackers
list to see what is possible. I think it would be helpful if we just
document what we _don't_ have.