* oleg@xxxxxxxxxx (Oleg Bartunov) wrote: | | On Tue, 30 May 2006, Lars Haugseth wrote: | | > I've setup a database using tsearch2, configured with support for compound | > words according to the excellent guide found here: | > | > http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words | > | > This works fine. There is however one drawback that I'd like to know | > whether can be remedied. Let's say I want to search for records containing | > the word 'fritekst', which is a compound Norwegian word meaning | > 'free text'. | > | > testdb=# select to_tsquery('default_norwegian', 'fritekst'); | > to_tsquery | > ------------------------------ | > 'fritekst' | 'fri' & 'tekst' | > (1 row) | > | > Now, this will indeed match those records, but it will also match any | > records containing both of the words 'fri' and 'tekst', without regard | > to whether they are next to each other or in completely different parts | > of the text being indexed. In many situations, this will lead to a lot | > of 'false' matches, seen from a user perspective. | > | > Ideas on how to handle this problem will be much appreciated. | | this is where order by relevance should helps. Thank you for pointing me to this, I hadn't thought about that. However, my first try with the rank_cd() function does not quite produce the results I had expected: SELECT set_curcfg('default_norwegian'); SELECT id, rank_cd(n, mytscol, to_tsquery('fritekst')) AS rank FROM mytable WHERE mytscol @@ to_tsquery('fritekst') ORDER BY rank DESC; No matter what value I use for n here, a record where the compound word 'fritekst' appears gets a rank of 0, where as records where the words 'fri' and 'tekst' appears separately all gets a rank > 0, the closer together, the higher the rank. If I try to set the value of n to 0, I still get a rank of 0 for a record containing 'fritekst', and 1 for all records containing 'fri' and 'tekst'. When using the rank() function instead of rank_cd() in the query above, records with the word 'fritekst' seem to score better, but I still get higher ranks for some records containing the separate words and not the compound word. -- Lars Haugseth "If anyone disagrees with anything I say, I am quite prepared not only to retract it, but also to deny under oath that I ever said it." -Tom Lehrer