Re: Pg makes nonoptimal choice between two multicolumn indexes with the same columns but in different order.

Michael Kolomeitsev <mkolomeitsev@xxxxxxxxx> · Sun, 29 Dec 2013 07:52:07 +0700

2013/12/29 Tom Lane <tgl@xxxxxxxxxxxxx>

I think that's just chance, because AFAICS the cost estimates are exactly

the same for both indexes, once you've done the vacuum to make all the
heap pages all-visible.  What's more, I'm not sure that that's wrong,
because according to EXPLAIN (ANALYZE, BUFFERS) the exact same number of

index pages are touched for either index.  So I think Michael's claim that
the one index is better is at best unproven.

Let me prove :)

1. I do benchmarking after dropping Pg and OS disk caches/buffers.
In a way I posted in my first message:
sh# systemctl stop postgresql.service ; echo 3 >/proc/sys/vm/drop_caches ; systemctl start postgresql.service

And timing results are quite stable: 200-210ms using t1_a_b_idx and 90-100ms using t1_b_a_idx.

Trying 'explain (analyze, buffers) ... ' I got this:
* using t1_a_b_idx:
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=46.62..46.63 rows=1 width=0) (actual time=228.853..228.854 rows=1 loops=1)
   Buffers: shared hit=12 read=23
   ->  Index Only Scan using t1_a_b_idx on t1  (cost=0.57..46.60 rows=7 width=0) (actual time=52.171..228.816 rows=7 loops=1)
         Index Cond: ((a = ANY ('{1,9,17,26,35,41,50}'::integer[])) AND (b = 333333))
         Heap Fetches: 7
         Buffers: shared hit=12 read=23
 Total runtime: 229.012 ms

* using t1_b_a_idx:
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=60.12..60.13 rows=1 width=0) (actual time=115.617..115.617 rows=1 loops=1)
   Buffers: shared hit=24 read=11
   ->  Index Only Scan using t1_b_a_idx on t1  (cost=0.57..60.10 rows=7 width=0) (actual time=80.460..115.590 rows=7 loops=1)
         Index Cond: ((b = 333333) AND (a = ANY ('{1,9,17,26,35,41,50}'::integer[])))
         Heap Fetches: 7
         Buffers: shared hit=24 read=11
 Total runtime: 116.480 ms

There is a difference in read operations and moreover in cost estimation.
23 - 11 = 12 excess read operations. If they are all random reads they may take ~100ms on typical home/workstation SATA hard drive. That's the difference between
timings I showed above.
Yes, I understand that Pg doesn't know (while planning the query) how many pages will be hitted in shared buffers.
But I can't get why there is the same buffers count (35) in both plans...
And I can't get why I have different cost estimations...

I grant the theory that the repeated index probes in t1_b_a_idx should be
more localized than those in t1_a_b_idx, but PG's optimizer doesn't

Yes, I see t1_a_b_idx and t1_b_a_idx have 3 levels in btree. For t1_a_b_idx Pg have to read 1 (header) + 1 (root) + 1 (common level 1 node) + 7 * 2 = 17 pages in it
and for t1_b_a_idx 1 + 1 + 3 = 5 pages ('cause all 7 pairs of (a, b) are located in one btree leaf node). 17 - 5 = 12 - this is the same difference as we can see in
'explain (analyze, buffers)'.

attempt to estimate such effects, and this example isn't doing much to
convince me that it'd be worth the trouble.

In a real life situation I have two kinds of queries for that table:
* select ... from t1 where a in (...) and b = ?
* select ... from t1 where a = ? and b in (...)

I select fields from t1 that are not in indexes thus there is no 'Index Only Scan', more random reads and performance impact of choosing t1_a_b_idx in both queries is a bit lesser.

And I got the answer ("PG's optimizer doesn't attempt to estimate such effects") for my situation.
Thanks a lot.