Tom Lane wrote: > Jesper Krogh <jesper@xxxxxxxx> writes: >> Tom Lane wrote: >>> ... There's something strange about your tsvector index. Maybe >>> it's really huge because the documents are huge? > >> huge is a relative term, but length(ts_vector(body)) is about 200 for >> each document. Is that huge? > > It's bigger than the toy example I was trying, but not *that* much > bigger. I think maybe your index is bloated. Try dropping and > recreating it and see if the estimates change any. I'm a bit reluctant to dropping it and re-creating it. It'll take a couple of days to regenerate, so this should hopefully not be an common situation for the system. I have set the statistics target to 1000 for the tsvector, the documentation didn't specify any heavy negative sides of doing that and since that I haven't seen row estimates that are orders of magnitude off. It is build from scratch using inserts all the way to around 10m now, should that result in index-bloat? Can I inspect the size of bloat without rebuilding (or similar locking operation)? The query still has a "wrong" tipping point between the two query-plans: ftstest=# explain analyze select body from ftstest where ftstest_body_fts @@ to_tsquery('testterm') order by id limit 100; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..7357.77 rows=100 width=738) (actual time=3978.974..8595.086 rows=100 loops=1) -> Index Scan using ftstest_id_pri_idx on ftstest (cost=0.00..1436458.05 rows=19523 width=738) (actual time=3978.971..8594.932 rows=100 loops=1) Filter: (ftstest_body_fts @@ to_tsquery('testterm'::text)) Total runtime: 8595.222 ms (4 rows) ftstest=# set enable_indexscan=off; SET ftstest=# explain analyze select body from ftstest where ftstest_body_fts @@ to_tsquery('testterm') order by id limit 100; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=59959.61..59959.86 rows=100 width=738) (actual time=338.832..339.055 rows=100 loops=1) -> Sort (cost=59959.61..60008.41 rows=19523 width=738) (actual time=338.828..338.908 rows=100 loops=1) Sort Key: id Sort Method: top-N heapsort Memory: 32kB -> Bitmap Heap Scan on ftstest (cost=22891.18..59213.45 rows=19523 width=738) (actual time=5.097..316.780 rows=19444 loops=1) Recheck Cond: (ftstest_body_fts @@ to_tsquery('testterm'::text)) -> Bitmap Index Scan on ftstest_tfs_idx (cost=0.00..22886.30 rows=19523 width=0) (actual time=4.259..4.259 rows=20004 loops=1) Index Cond: (ftstest_body_fts @@ to_tsquery('testterm'::text)) Total runtime: 339.201 ms (9 rows) So for getting 100 rows where the term exists in 19.444 of 10.000.000 documents it chooses the index-scan where it (given random distribution of the documents) should scan: 100*(10000000/19444) = 51429 documents. So it somehow believes that the cost for the bitmap index scan is higher than it actually is or the cost for the index-scan is lower than it actually is. Is is possible to manually set the cost for the @@ operator? It seems natural that matching up a ts_vector to a ts_query, which is a much heavier operation than = and even is stored in EXTENDED storage should be much higher than a integer in plain storage. I tried to search docs for operator cost, but I only found the overall ones in the configuration file that are base values. Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance