Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > The answer to that clearly is to not index common terms My understanding is that we don't currently get statistics on how common the terms in a tsvector column are until we ANALYZE the *index* created from it. Seems like sort of a Catch 22. Also, if we exclude words which are in the tsvector from the index on the tsvector, we need to know what words were excluded so we know not to search on them as well as forcing the recheck of the full tsquery (unless this always happens already?). > It may well be that Jesper's identified a place where the GIN code > could be improved My naive assumption has been that it would be possible to get an improvement without touching the index logic, by changing this part of the query plan: Index Cond: (ftsbody_body_fts @@ to_tsquery ('TERM1 & TERM2 & TERM3 & TERM4 & TERM5'::text)) to something like this: Index Cond: (ftsbody_body_fts @@ to_tsquery ('TERM1'::text)) and count on this doing the rest: Recheck Cond: (ftsbody_body_fts @@ to_tsquery ('TERM1 & TERM2 & TERM3 & TERM4 & TERM5'::text)) I'm wondering if anyone has ever confirmed that probing for the more frequent term through the index is *ever* a win, versus using the index for the most common of the top level AND conditions and doing the rest on recheck. That seems like a dangerous assumption from which to start. > But the particular example shown here doesn't make a very good case > for that, because it's hard to tell how much of a penalty would be > taken in more realistic examples. Fair enough. We're in the early stages of moving to tsearch2 and I haven't run across this yet in practice. If I do, I'll follow up. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance