Hi At 2013/2/8 I wrote: > I have problems with the performance of FTS in a query like this: > > SELECT * FROM FullTextSearch WHERE content_tsv_gin @@ > plainto_tsquery('english', 'good'); > > It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped). > The planner obviously always chooses table scan Now, I've identified (but only partially resolved) the issue: Here are my comments: Thats the query in question (see commented log below): select id,title,left(content,100) from fulltextsearch where plainto_tsquery('pg_catalog.english','good') @@ to_tsvector('pg_catalog.english',content); After having created the GIN index, the FTS query unexpectedly is fast because planner chooses "Bitmap Index Scan". After the index statistics have been updated, the same query becomes slow. Only when using the "trick" with the function in the WHERE clause. I think GIST does'nt change anything. select id,title,left(content,100) from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query where query @@ to_tsvector('pg_catalog.english',content); => This hint should mentioned in the docs! Then, setting enable_seqscan to off makes original query fast again. But that's a setting I want to avoid in a multi-user database. Finally, setting random_page_cost to 1 helps also - but I don't like this setting neither. => To me the planner should be updated to recognize immutable plainto_tsquery() function in the WHERE clause and choose "Bitmap Index Scan" at the first place. What do you think? Yours, Stefan ---- Lets look at table fulltextsearch: movies=# \d fulltextsearch Table "public.fulltextsearch" Column | Type | Modifiers ---------+---------+------------------------------------------------------------- id | integer | not null default nextval('fulltextsearch_id_seq'::regclass) docid | integer | default 0 title | text | content | text | not null movies=# CREATE INDEX fulltextsearch_gincontent ON fulltextsearch USING gin(to_tsvector('pg_catalog.english',content)); movies=# SELECT * FROM pg_class c WHERE relname LIKE 'fullt%'; oid | name | kind | tuples | pages | allvisible | toastrelid | hasindex --------+---------------------------+------+-------------+-------+------------+------------+---------- 476289 | fulltextsearch | r | 27886 | 555 | 0 | 476293 | t 503080 | fulltextsearch_gincontent | i | 8.97135e+06 | 11133 | 0 | 0 | f 476296 | fulltextsearch_id_seq | S | 1 | 1 | 0 | 0 | f 503075 | fulltextsearch_pkey | i | 27886 | 79 | 0 | 0 | f (4 rows) => fulltextsearch_gincontent has an arbitrary large number of tuples (statistics is wrong and not yet updated) movies=# explain (analyze,costs,timing,buffers) select id,title,left(content,100) from fulltextsearch where plainto_tsquery('pg_catalog.english','good') @@ to_tsvector('pg_catalog.english',content); => Unexpectedly, the query is fast! See query plan http://explain.depesz.com/s/ewn Let's update the statistics: movies=# VACUUM ANALYZE VERBOSE fulltextsearch ; SELECT * FROM pg_class c WHERE relname LIKE 'fullt%'; oid | name | kind | tuples | pages | allvisible | toastrelid | hasindex --------+---------------------------+------+--------+-------+------------+------------+---------- 476289 | fulltextsearch | r | 27886 | 555 | 555 | 476293 | t 503080 | fulltextsearch_gincontent | i | 27886 | 11133 | 0 | 0 | f 476296 | fulltextsearch_id_seq | S | 1 | 1 | 0 | 0 | f 503075 | fulltextsearch_pkey | i | 27886 | 79 | 0 | 0 | f (4 rows) => Now after having update statistics (see especially tuples of fulltextsearch_gincontent ) the original query is slow! See query plan http://explain.depesz.com/s/MQ60 Now, let's reformulate the original query and move the function call to plainto_tsquery to the FROM clause: movies=# explain (analyze,costs,timing,buffers) select id,title,left(content,100) from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query where query @@ to_tsvector('pg_catalog.english',content); => This special query is fast again! See query plan http://explain.depesz.com/s/FVT Setting enable_seqscan to off makes query fast again: See query plan http://explain.depesz.com/s/eOr Finally, setting random_page_cost to 1 helps also (default is 4): movies=# set enable_seqscan to default; movies=# set random_page_cost to 1.0; => Query is fast. See query plan http://explain.depesz.com/s/M5Ke ---- -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance