planner costs in "warm cache" tests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi.

I'm trying to get the planner to do sort of the correct thing
when choosing between index-scans on btree indices and
bitmap heap scans.

There has been several things going on in parallel. One
is that the statistics data is off:
http://thread.gmane.org/gmane.comp.db.postgresql.devel.general/141420/focus=141735

The other one that the costestimates (number of pages
to read) is inaccurate on gin indices:
http://archives.postgresql.org/pgsql-performance/2009-10/msg00393.php
which there also is coming a solution to that I'm testing out.

I was trying to nullify the problem with the wrongly estimated number
of pages to read and see if "the rest" seems to work as expected.

The theory was that if I set "seq_page_cost" and "random_page_cost" to
something "really low" (0 is not permitted) and ran tests
on a fully cached query (where both costs indeed is "really low").
then the "cheapest" query should indeed also be the fastest one.
Let me know if the logic is flawed.

The test dataset is 1365462 documents, running pg9.0b1, both queries run twice to
see that the data actually is fully cached as expected.

testdb=# set seq_page_cost = 0.00001;
SET
testdb=# set random_page_cost = 0.00001;
SET
testdb=# set enable_indexscan = on;
SET
testdb=# explain analyze select id from testdb.reference where document_tsvector @@ to_tsquery('literature') order by accession_number limit 200; QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..432.82 rows=200 width=11) (actual time=831.456..2167.302 rows=200 loops=1) -> Index Scan using ref_acc_idx on reference (cost=0.00..61408.12 rows=28376 width=11) (actual time=831.451..2166.434 rows=200 loops=1)
         Filter: (document_tsvector @@ to_tsquery('literature'::text))
 Total runtime: 2167.982 ms
(4 rows)

testdb=# explain analyze select id from testdb.reference where document_tsvector @@ to_tsquery('literature') order by accession_number limit 200; QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..432.82 rows=200 width=11) (actual time=842.990..2187.393 rows=200 loops=1) -> Index Scan using ref_acc_idx on reference (cost=0.00..61408.12 rows=28376 width=11) (actual time=842.984..2186.540 rows=200 loops=1)
         Filter: (document_tsvector @@ to_tsquery('literature'::text))
 Total runtime: 2188.083 ms
(4 rows)

testdb=# set enable_indexscan = off;
SET
testdb=# explain analyze select id from testdb.reference where document_tsvector @@ to_tsquery('literature') order by accession_number limit 200; QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2510.68..2511.18 rows=200 width=11) (actual time=270.016..270.918 rows=200 loops=1) -> Sort (cost=2510.68..2581.62 rows=28376 width=11) (actual time=270.011..270.321 rows=200 loops=1)
         Sort Key: accession_number
         Sort Method:  top-N heapsort  Memory: 34kB
-> Bitmap Heap Scan on reference (cost=219.94..1284.29 rows=28376 width=11) (actual time=13.897..216.700 rows=21613 loops=1) Recheck Cond: (document_tsvector @@ to_tsquery('literature'::text)) -> Bitmap Index Scan on reference_fts_idx (cost=0.00..212.85 rows=28376 width=0) (actual time=10.053..10.053 rows=21613 loops=1) Index Cond: (document_tsvector @@ to_tsquery('literature'::text))
 Total runtime: 271.323 ms
(9 rows)

testdb=# explain analyze select id from testdb.reference where document_tsvector @@ to_tsquery('literature') order by accession_number limit 200; QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2510.68..2511.18 rows=200 width=11) (actual time=269.881..270.782 rows=200 loops=1) -> Sort (cost=2510.68..2581.62 rows=28376 width=11) (actual time=269.876..270.182 rows=200 loops=1)
         Sort Key: accession_number
         Sort Method:  top-N heapsort  Memory: 34kB
-> Bitmap Heap Scan on reference (cost=219.94..1284.29 rows=28376 width=11) (actual time=14.113..216.173 rows=21613 loops=1) Recheck Cond: (document_tsvector @@ to_tsquery('literature'::text)) -> Bitmap Index Scan on reference_fts_idx (cost=0.00..212.85 rows=28376 width=0) (actual time=10.360..10.360 rows=21613 loops=1) Index Cond: (document_tsvector @@ to_tsquery('literature'::text))
 Total runtime: 271.533 ms
(9 rows)


So in the situation where i have tried to "nullify" the actual disc-cost, hopefully leaving only the cpu and other cost back and running the query in fully cached mode (two runs). the bitmap-heap-scan is still hugely favorable in actual runtime. (which isn't that much a suprise) but it seems strange that the
index-scan is still favored in the cost calculations?

I have tried to alter the cost of ts_match_vq but even setting it to 1000 does not change the overall picture.

Is the approach simply too naive?

--
Jesper


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux