Jesper Krogh <jesper@xxxxxxxx> writes: > I'm currently trying to figure out why the tsearch performance seems to > vary a lot between different queryplans. I have created a sample dataset > that sort of resembles the data I have to work on. > The script that builds the dataset is at: > http://krogh.cc/~jesper/build-test.pl > and http://krogh.cc/~jesper/words.txt is needed for it to run. I got around to looking at this example finally, and I can reproduce your results pretty closely. I think there are two things going on: 1. The cost estimates for to_tsquery and ts_match_vq don't reflect the actually-rather-high costs of those functions. Since the seqscan plan executes these functions many more times than the indexscan plan, that results in a relative cost error. There's already been some discussion of changing the default costs for the tsearch functions, but nothing's been done yet. However, that seems to be a relatively small problem compared to... 2. The planner is estimating that most of the GIN index has to be examined --- specifically, it estimates (pretty accurately) that 40188 out of 50000 table rows will match, and the default assumption is that that means 40188/50000 of the index blocks will have to be read. On my machine the index amounts to 39076 blocks, so we estimate 31407 index blocks have to be read, and that's why the cost estimate for the indexscan is so huge. The *actual* number of index blocks read for the query, according to the stats collector, is 66. So it appears that genericcostestimate() is just completely inappropriate for GIN indexes, at least when considering common terms. I guess that's not so astonishing when you remember that GIN isn't built around per-heap-tuple entries the way the other index types are. Oleg, Teodor, can you suggest any better cost metric to use for GIN? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance