Tom Lane wrote: > It may well be that Jesper's identified a place where the GIN code could > be improved --- it seems like having the top-level search logic be more > aware of the AND/OR structure of queries would be useful. But the > particular example shown here doesn't make a very good case for that, > because it's hard to tell how much of a penalty would be taken in more > realistic examples. With a term sitting in: 80% of the docs the penalty is: x23 60% of the docs the penalty is: x17 40% of the docs the penalty is: x13 of doing vectorcol @@ ts_query('term & commonterm') compared to vectorcol @@ ts_query('term) and vectorcol @@ ts_query('commonterm'); where term is non-existing (or rare). (in query execution performance on a fully memory recident dataset, doing test with "drop_caches" and restart pg to simulate a dead disk the numbers are a bit higher). http://article.gmane.org/gmane.comp.db.postgresql.performance/22496/match= Would you ever quantify a term sitting in 60-80% as a stop-word candidate? I dont know if x13 in execution performance is worth hunting or there are lower hanging fruits sitting in the fts-search-system. This is essentially the penalty the user will get for adding a terms to their search that rarely restricts the results. In term of the usual "set theory" that databases work in, a search for a stop-word translated into the full set. This is just not the case in where it throws a warning and returns the empty set. This warning can be caught by application code to produce the "correct" result to the users, but just slightly more complex queries dont do this: ftstest=# select id from ftstest where body_fts @@ to_tsquery('random | the') limit 10; id ---- (0 rows) Here I would have expected the same error.. I basically have to hook in the complete stop-word dictionary in a FTS-preparser to give the user the expected results or have I missed a feature somwhere? My reason for not pushing "commonterms" into the stopword list is that they actually perform excellent in PG. Same body as usual, but commonterm99 is sitting in 99% of the documents. ftstest=# set enable_seqscan=off; SET ftstest=# explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm99'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on ftstest (cost=1051476.74..1107666.07 rows=197887 width=4) (actual time=51.036..121.348 rows=197951 loops=1) Recheck Cond: (body_fts @@ to_tsquery('commonterm99'::text)) -> Bitmap Index Scan on ftstest_gin_idx (cost=0.00..1051427.26 rows=197887 width=0) (actual time=49.602..49.602 rows=197951 loops=1) Index Cond: (body_fts @@ to_tsquery('commonterm99'::text)) Total runtime: 147.350 ms (5 rows) ftstest=# set enable_seqscan=on; SET ftstest=# explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm99'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Seq Scan on ftstest (cost=0.00..56744.00 rows=197887 width=4) (actual time=0.086..7134.384 rows=197951 loops=1) Filter: (body_fts @@ to_tsquery('commonterm99'::text)) Total runtime: 7194.182 ms (3 rows) So in order to get the result with a speedup of more than x50 I simply cannot add these terms to the stop-words because then the first query would resolve to an error and getting results would then be up to the second query. My bet is that doing a seq_scan will "never" be beneficial for this type of query. As far as I can see the only consequence of simply not remove stop-words at all is a (fairly small) increase in index-size. It seems to me that stop-words were invented when it was hard to get more than 2GB of memory into a computer to get the index-size reduced to a size that better could fit into memory. But nowadays it seems like the downsides are hard to see? Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance