Search Postgresql Archives

Re: tsearch2: plainto_tsquery() with OR?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for your response! Let me try to elaborate what I meant with my original post.

If R is the set of words in the tsvector for a given table row and S is the set of keywords to search for (entered by e.g. a website user) I would like to receive all rows for which the intersection between R and S is nonempty. That is: The row should be return if just there is SOME match. S does not necessarily need to be a subset of R.

Furthermore I would like a measure for how "nonempty" the intersection is (we would call this measure "the rank").
Example:
For R = "three big houses" and S = "three small houses" the rank should be higher than for R = "three big houses" and S = "four small houses" as the first case has two words in common while the second case has only one.

A version of plainto_tsquery() with a simple OR operator instead of AND would solve this problem somewhat elegant: 1) I can now use the conventional "tsvector @@ tsquery" syntax in my WHERE clause as the "@@" operator will return true and thus include the row in the result. Example:
  select to_tsvector('simple', 'three small houses')
         @@ 'four|big|houses'::tsquery;
would return "true".

2) The rank() of the @@ operator is automatically higher when there is a good match.


An example where this OR-version of plainto_tsquery() could be useful is for websites using tags. Each website entry is associated with some tags and each user has defined some "tags of interest". The search should then return all website entries where there is a match (not necessarily complete) with the users tags of interest. Of course the best matching entries should be displayed top most.


I find it important that this function is a part of tsearch2 itself as:
1) The user can input arbitrary data. Also potentially harmful data if they are not escaped right. 2) Special characters should be stripped in just the same way as to_tsvector() does it. E.g. stripping the dot in "Hi . there" but keeping it in "web 2.0". Only tsearch2 can do that in a clean consistent way - it would be fairly messy if some thirdparty or especially some website-developer-homecooked stripping functionality is used for this.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux