Re: Question about (probably wrong) index scan cost for conditional indexes

Maxim Boguk <maxim.boguk@xxxxxxxxx> · Tue, 24 Jan 2012 10:14:58 +1100

Hi.

Seems previous test case not clear demonstrate the problem which i have stuck with.

Now much better and close to reality test case:

Preparation:
set random_page_cost to 4;
set seq_page_cost to 1;

create table test (id integer primary key, sections integer[], value float);
insert into test select id, ('{'||((random()*10)::integer)||'}')::integer[] as value, random() as value from generate_series(1,1000000) as g(id);

--generic gist index for array
CREATE INDEX test_sections_gist on test using gist(sections);
--specialized index on value for sections && '{2}'
CREATE INDEX test_value_in2section_key on test(value) where sections && '{2}';

analyze test;

Now actual tests:

Good query but cost definitely wrong:
postgres=# EXPLAIN ANALYZE SELECT * from test where sections && '{2}' order by value limit 100;
                                                                  QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..539.29 rows=100 width=37) (actual time=0.043..0.499 rows=100 loops=1)

   ->  Index Scan using test_value_in2section_key on test  (cost=0.00..5392.87 rows=1000 width=37) (actual time=0.040..0.434 rows=100 loops=1)
 Total runtime: 0.570 ms

Compare with almost equivalent query:

postgres=#  EXPLAIN ANALYZE SELECT * from test order by id limit 100;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=0.00..3.43 rows=100 width=37) (actual time=0.057..0.192 rows=100 loops=1)
   ->  Index Scan using test_pkey on test  (cost=0.00..34317.36 rows=1000000 width=37) (actual time=0.054..0.115 rows=100 loops=1)

 Total runtime: 0.258 ms

Actual speed almost same but cost differs 100 times.

Now if I increase the limit I start getting slow plans because it switch to GIST index and bitmap scan (because cost of common index scan too high):

postgres=# EXPLAIN ANALYZE SELECT * from test where sections && '{2}' order by value limit 1000;
                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=2941.68..2944.18 rows=1000 width=37) (actual time=175.301..175.766 rows=1000 loops=1)
   ->  Sort  (cost=2941.68..2944.18 rows=1000 width=37) (actual time=175.298..175.541 rows=1000 loops=1)
         Sort Key: value

         Sort Method: top-N heapsort  Memory: 127kB
         ->  Bitmap Heap Scan on test  (cost=56.48..2891.85 rows=1000 width=37) (actual time=80.230..132.479 rows=99641 loops=1)
               Recheck Cond: (sections && '{2}'::integer[])

               ->  Bitmap Index Scan on test_sections_gist  (cost=0.00..56.23 rows=1000 width=0) (actual time=78.112..78.112 rows=99641 loops=1)
                     Index Cond: (sections && '{2}'::integer[])

 Total runtime: 175.960 ms
(9 rows)

Even if I drop GIST index i'm still getting wrong plan:
postgres=# drop index test_sections_gist;
DROP INDEX
postgres=# EXPLAIN ANALYZE SELECT * from test where sections && '{2}' order by value limit 1000;

                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=4489.88..4492.38 rows=1000 width=37) (actual time=116.637..117.088 rows=1000 loops=1)
   ->  Sort  (cost=4489.88..4492.38 rows=1000 width=37) (actual time=116.635..116.857 rows=1000 loops=1)
         Sort Key: value

         Sort Method: top-N heapsort  Memory: 127kB
         ->  Bitmap Heap Scan on test  (cost=1604.68..4440.05 rows=1000 width=37) (actual time=22.175..74.556 rows=99641 loops=1)
               Recheck Cond: (sections && '{2}'::integer[])

               ->  Bitmap Index Scan on test_value_in2section_key  (cost=0.00..1604.43 rows=1000 width=0) (actual time=20.248..20.248 rows=99641 loops=1)
 Total runtime: 117.261 ms

And only if I completely disable bitmap scan I get good fast plan (but with exceptional high cost):

postgres=# set enable_bitmapscan to 0;
SET
postgres=# EXPLAIN ANALYZE SELECT * from test where sections && '{2}' order by value limit 1000;
                                                                   QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..5392.87 rows=1000 width=37) (actual time=0.047..4.123 rows=1000 loops=1)

   ->  Index Scan using test_value_in2section_key on test  (cost=0.00..5392.87 rows=1000 width=37) (actual time=0.044..3.552 rows=1000 loops=1)
 Total runtime: 4.460 ms

I hope that test case will make my issue more clear.

Regards,
Maksym

On Mon, Jan 23, 2012 at 11:46 AM, Maxim Boguk <maxim.boguk@xxxxxxxxx> wrote:

On Mon, Jan 23, 2012 at 11:28 AM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

Maxim Boguk <maxim.boguk@xxxxxxxxx> writes:

> But it seems that index scan cost for very narrow/selective conditional

> indexes is greatly overestimated at least in some cases.

I realized in connection with

http://archives.postgresql.org/pgsql-general/2012-01/msg00459.php

that btcostestimate is not correctly estimating numIndexTuples for

partial indexes.  But it's impossible to tell from this amount of

information whether you're seeing an effect of that, or something else.

Can you provide a self-contained test case?

                        regards, tom lane

Prorably simpliest test case:

set random_page_cost to 4;

set seq_page_cost to 1;

drop table  if exists test;
CREATE TABLE test (id integer primary key, value1 float, value2 float, value3 float, value4 float);
INSERT into test select id,random() as value1,random() as value2, random() as value3,random() as value4 from generate_series(1,1000000) as g(id);

CREATE INDEX test_special_key on test(value1) where value2*2<0.01 and value3*2<0.01 and value4*2<0.01;
ANALYZE test;

postgres=# EXPLAIN ANALYZE select * from test order by id limit 100;
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..3.43 rows=100 width=36) (actual time=0.042..0.170 rows=100 loops=1)

   ->  Index Scan using test_pkey on test  (cost=0.00..34317.36 rows=1000000 width=36) (actual time=0.040..0.108 rows=100 loops=1)
 Total runtime: 0.243 ms
(3 rows)

vs

postgres=# EXPLAIN ANALYZE select * from test where value2*2<0.01 and value3*2<0.01 and value4*2<0.01 order by value1 limit 100;

                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..92.52 rows=100 width=36) (actual time=0.072..0.072 rows=0 loops=1)

   ->  Index Scan using test_special_key on test  (cost=0.00..34264.97 rows=37037 width=36) (actual time=0.070..0.070 rows=0 loops=1)
 Total runtime: 0.113 ms
(3 rows)

cost difference:
 (cost=0.00..3.43 rows=100 width=36) 

vs
(cost=0.00..92.52 rows=100 width=36)

An actual speed (and theoretical performance) almost same.

More selective conditions added to conditional index - worse situation with wrong costing.

Kind Regards,
Maksym

-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.boguk@xxxxxxxxx

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
"If they can send one man to the moon... why can't they send them all?"

МойКруг: http://mboguk.moikrug.ru/
"People problems are solved with people. 
If people cannot solve the problem, try technology. 
People will then wish they'd listened at the first stage."

-- 
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.boguk@xxxxxxxxx

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
"If they can send one man to the moon... why can't they send them all?"

МойКруг: http://mboguk.moikrug.ru/
"People problems are solved with people. 
If people cannot solve the problem, try technology. 
People will then wish they'd listened at the first stage."