I've finally made some fruitful steps in writing C functions that manipulate tsvectors. I'd like to build up a simple system based on ts_rank to find similarities between documents. I've some documents containing 4 parts. I build a tsvector the "usual way" setweight(tsvector(field1), 'A') | setweight(tsvector(field2), 'B') | etc... then I'd like to build a query similar to: tsvector @@ to_tsquery( 'field1_lexeme1':A | 'field1_lexeme2':A | ... 'field2_lexeme2':B | 'field2_lexeme2':B | ... Anyway so many OR are going to return a lot of rows and filtering on rank is "too late" for performances. One way to shrink the result set would be to build a query that requires at least 2 lexemes to be present: 'field1_lexeme1':A & ('field1_lexeme2':A | ... 'field2_lexeme2':B | 'field2_lexeme2':B | ... ) | 'field1_lexeme2':A & ('field1_lexeme1 | ... ) | I don't have very long documents and this looks feasible but I'd like to hear any other suggestion to shrink the result set further before filtering on ts_rank... especially suggestions that will exploit the index. So any suggestion that could reduce the result set before filtering on rank is welcome and I'll try to put them in practice in some C functions that taken a tsvector build up a tsquery to be used to find similar documents. -- Ivan Sergio Borgonovo http://www.webthatworks.it -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general