Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.

Tom Lane <tgl@xxxxxxxxxxxxx> · Sat, 28 Dec 2013 16:56:42 -0500

Kevin Grittner <kgrittn@xxxxxxxxx> writes:
> Michael Kolomeitsev <mkolomeitsev@xxxxxxxxx> wrote:
>> it is clear for me why t1_b_a_idx is better. The question is: Is
>> postgresql able to see that?

> For a number of reasons I never consider a bulk load complete until
> I run VACUUM FREEZE ANALYZE on the table(s) involved.  When I try
> your test case without that, I get the bad index choice.  When I
> then run VACUUM FREEZE ANALYZE on the database I get the good index
> choice.

I think that's just chance, because AFAICS the cost estimates are exactly
the same for both indexes, once you've done the vacuum to make all the
heap pages all-visible.  What's more, I'm not sure that that's wrong,
because according to EXPLAIN (ANALYZE, BUFFERS) the exact same number of
index pages are touched for either index.  So I think Michael's claim that
the one index is better is at best unproven.

regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=32.12..32.13 rows=1 width=0) (actual time=0.097..0.098 rows=1 loops=1)
   Buffers: shared hit=30
   ->  Index Only Scan using t1_b_a_idx on t1  (cost=0.57..32.10 rows=7 width=0) (actual time=0.044..0.085 rows=7 loops=1)
         Index Cond: ((b = 333333) AND (a = ANY ('{1,9,17,26,35,41,50}'::integer[])))
         Heap Fetches: 0
         Buffers: shared hit=30
 Total runtime: 0.174 ms
(7 rows)

regression=# begin; drop index t1_b_a_idx;
BEGIN
DROP INDEX
regression=# explain (analyze, buffers) select count(*) from t1 where a in (1, 9, 17, 26, 35, 41, 50) and b = 333333;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=32.12..32.13 rows=1 width=0) (actual time=0.110..0.110 rows=1 loops=1)
   Buffers: shared hit=30
   ->  Index Only Scan using t1_a_b_idx on t1  (cost=0.57..32.10 rows=7 width=0) (actual time=0.039..0.101 rows=7 loops=1)
         Index Cond: ((a = ANY ('{1,9,17,26,35,41,50}'::integer[])) AND (b = 333333))
         Heap Fetches: 0
         Buffers: shared hit=30
 Total runtime: 0.199 ms
(7 rows)

regression=# abort;
ROLLBACK

I grant the theory that the repeated index probes in t1_b_a_idx should be
more localized than those in t1_a_b_idx, but PG's optimizer doesn't
attempt to estimate such effects, and this example isn't doing much to
convince me that it'd be worth the trouble.

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance