Hi. I'm currently testing out PostgreSQL's Full Text Search capabillities. We're currenly using Xapian, it has some nice features and some drawbacks (sorting), so it is especially this area I'm investigating. I've loaded the database with 50K documents, and the table definition is: ftstest=# \d uniprot Table "public.uniprot" Column | Type | Modifiers ------------------+----------+------------------------------------------------------ id | integer | not null default nextval('textbody_id_seq'::regclass) body | text | not null default ''::text textbody_body_fts | tsvector | accession_number | text | not null default ''::text Indexes: "accno_unique_idx" UNIQUE, btree (accession_number) "textbody_tfs_idx" gin (textbody_body_fts) Triggers: tsvectorupdate BEFORE INSERT OR UPDATE ON textbody FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('textbody_body_fts', 'pg_catalog.english', 'body') "commonterm" matches 37K of the 50K documents (majority), but the query plan is "odd" in my eyes. * Why does it mis-guess the cost of a Seq Scan on textbody so much? * Why doesn't it use the index in "id" to fetch the 10 records? ftstest=# ANALYZE textbody; ANALYZE ftstest=# explain analyze select body from textbody where textbody_body_fts @@ to_tsquery('commonterm') order by id limit 10 offset 0 QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Limit (cost=2841.08..2841.11 rows=10 width=5) (actual time=48031.563..48031.568 rows=10 loops=1) -> Sort (cost=2841.08..2933.01 rows=36771 width=5) (actual time=48031.561..48031.564 rows=10 loops=1) Sort Key: id Sort Method: top-N heapsort Memory: 31kB -> Seq Scan on textbody (cost=0.00..2046.47 rows=36771 width=5) (actual time=100.107..47966.590 rows=37133 loops=1) Filter: (textbody_body_fts @@ to_tsquery('commonterm'::text)) Total runtime: 48031.612 ms (7 rows) This query-plan doesn't answer the questions above, but it does indeed speed it up significantly (by heading into a Bitmap Index Scan instead of a Seq Scan) ftstest=# set enable_seqscan=off; SET ftstest=# explain analyze select body from textbody where textbody_body_fts @@ to_tsquery('commonterm') order by id limit 10 offset 0 QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=269942.41..269942.43 rows=10 width=5) (actual time=47.567..47.572 rows=10 loops=1) -> Sort (cost=269942.41..270034.34 rows=36771 width=5) (actual time=47.565..47.567 rows=10 loops=1) Sort Key: id Sort Method: top-N heapsort Memory: 31kB -> Bitmap Heap Scan on textbody (cost=267377.23..269147.80 rows=36771 width=5) (actual time=15.763..30.576 rows=37133 loops=1) Recheck Cond: (textbody_body_fts @@ to_tsquery('commonterm'::text)) -> Bitmap Index Scan on textbody_tfs_idx (cost=0.00..267368.04 rows=36771 width=0) (actual time=15.419..15.419 rows=37134 loops=1) Index Cond: (textbody_body_fts @@ to_tsquery('commonterm'::text)) Total runtime: 47.634 ms (9 rows) To me it seems like the query planner could do a better job? On "rare" terms everything seems to work excellent. N.B.: looks a lot like this: http://archives.postgresql.org/pgsql-performance/2009-07/msg00190.php -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance