Hi.
I'm trying to get the planner to do sort of the correct thing
when choosing between index-scans on btree indices and
bitmap heap scans.
There has been several things going on in parallel. One
is that the statistics data is off:
http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/141420/focus=141735
The other one that the costestimates (number of pages
to read) is inaccurate on gin indices:
http://archives.postgresql.org/pgsql-performance/2009-10/msg00393.php
which there also is coming a solution to that I'm testing out.
I was trying to nullify the problem with the wrongly estimated number
of pages to read and see if "the rest" seems to work as expected.
The theory was that if I set "seq_page_cost" and "random_page_cost" to
something "really low" (0 is not permitted) and ran tests
on a fully cached query (where both costs indeed is "really low").
then the "cheapest" query should indeed also be the fastest one.
Let me know if the logic is flawed.
The test dataset is 1365462 documents, running pg9.0b1, both queries run
twice to
see that the data actually is fully cached as expected.
testdb=# set seq_page_cost = 0.00001;
SET
testdb=# set random_page_cost = 0.00001;
SET
testdb=# set enable_indexscan = on;
SET
testdb=# explain analyze select id from testdb.reference where
document_tsvector @@ to_tsquery('literature') order by accession_number
limit 200;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..432.82 rows=200 width=11) (actual
time=831.456..2167.302 rows=200 loops=1)
-> Index Scan using ref_acc_idx on reference (cost=0.00..61408.12
rows=28376 width=11) (actual time=831.451..2166.434 rows=200 loops=1)
Filter: (document_tsvector @@ to_tsquery('literature'::text))
Total runtime: 2167.982 ms
(4 rows)
testdb=# explain analyze select id from testdb.reference where
document_tsvector @@ to_tsquery('literature') order by accession_number
limit 200;
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..432.82 rows=200 width=11) (actual
time=842.990..2187.393 rows=200 loops=1)
-> Index Scan using ref_acc_idx on reference (cost=0.00..61408.12
rows=28376 width=11) (actual time=842.984..2186.540 rows=200 loops=1)
Filter: (document_tsvector @@ to_tsquery('literature'::text))
Total runtime: 2188.083 ms
(4 rows)
testdb=# set enable_indexscan = off;
SET
testdb=# explain analyze select id from testdb.reference where
document_tsvector @@ to_tsquery('literature') order by accession_number
limit 200;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2510.68..2511.18 rows=200 width=11) (actual
time=270.016..270.918 rows=200 loops=1)
-> Sort (cost=2510.68..2581.62 rows=28376 width=11) (actual
time=270.011..270.321 rows=200 loops=1)
Sort Key: accession_number
Sort Method: top-N heapsort Memory: 34kB
-> Bitmap Heap Scan on reference (cost=219.94..1284.29
rows=28376 width=11) (actual time=13.897..216.700 rows=21613 loops=1)
Recheck Cond: (document_tsvector @@
to_tsquery('literature'::text))
-> Bitmap Index Scan on reference_fts_idx
(cost=0.00..212.85 rows=28376 width=0) (actual time=10.053..10.053
rows=21613 loops=1)
Index Cond: (document_tsvector @@
to_tsquery('literature'::text))
Total runtime: 271.323 ms
(9 rows)
testdb=# explain analyze select id from testdb.reference where
document_tsvector @@ to_tsquery('literature') order by accession_number
limit 200;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2510.68..2511.18 rows=200 width=11) (actual
time=269.881..270.782 rows=200 loops=1)
-> Sort (cost=2510.68..2581.62 rows=28376 width=11) (actual
time=269.876..270.182 rows=200 loops=1)
Sort Key: accession_number
Sort Method: top-N heapsort Memory: 34kB
-> Bitmap Heap Scan on reference (cost=219.94..1284.29
rows=28376 width=11) (actual time=14.113..216.173 rows=21613 loops=1)
Recheck Cond: (document_tsvector @@
to_tsquery('literature'::text))
-> Bitmap Index Scan on reference_fts_idx
(cost=0.00..212.85 rows=28376 width=0) (actual time=10.360..10.360
rows=21613 loops=1)
Index Cond: (document_tsvector @@
to_tsquery('literature'::text))
Total runtime: 271.533 ms
(9 rows)
So in the situation where i have tried to "nullify" the actual
disc-cost, hopefully leaving only the
cpu and other cost back and running the query in fully cached mode (two
runs). the bitmap-heap-scan
is still hugely favorable in actual runtime. (which isn't that much a
suprise) but it seems strange that the
index-scan is still favored in the cost calculations?
I have tried to alter the cost of ts_match_vq but even setting it to
1000 does not change the overall picture.
Is the approach simply too naive?
--
Jesper
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance