Hi,
While working on a slow query of one of our customers, I found this behaviour which seems quite weird to me. Here it is:
(no changes on any parameter)
EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Unique (cost=12005.97..12049.20 rows=1 width=1430) (actual time=20055.294..20323.348 rows=1 loops=1)
-> Sort (cost=12005.97..12006.30 rows=132 width=1430) (actual time=20055.290..20105.738 rows=60000 loops=1)
Sort Key: (... 130 columns ...)
Sort Method: external sort Disk: 84464kB
-> Seq Scan on gleu2 (cost=0.00..12001.32 rows=132 width=1430) (actual time=0.109..114.142 rows=60000 loops=1)
Planning time: 10.012 ms
Execution time: 20337.854 ms
(7 rows)
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Unique (cost=12005.97..12049.20 rows=1 width=1430) (actual time=20055.294..20323.348 rows=1 loops=1)
-> Sort (cost=12005.97..12006.30 rows=132 width=1430) (actual time=20055.290..20105.738 rows=60000 loops=1)
Sort Key: (... 130 columns ...)
Sort Method: external sort Disk: 84464kB
-> Seq Scan on gleu2 (cost=0.00..12001.32 rows=132 width=1430) (actual time=0.109..114.142 rows=60000 loops=1)
Planning time: 10.012 ms
Execution time: 20337.854 ms
(7 rows)
That looks quite good. The total cost is 12049, so I expect this plan to have the smaller cost as it's the choosen plan. Now, I'm disabling Sort, and here is what I get:
SET enable_sort TO off;
EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=12044.22..12044.23 rows=1 width=1430) (actual time=508.342..508.343 rows=1 loops=1)
Group Key: (... 130 columns ...)
-> Seq Scan on gleu2 (cost=0.00..12001.32 rows=132 width=1430) (actual time=0.036..57.088 rows=60000 loops=1)
Planning time: 1.335 ms
Execution time: 508.529 ms
(5 rows)
EXPLAIN (ANALYZE) SELECT DISTINCT * FROM gleu2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=12044.22..12044.23 rows=1 width=1430) (actual time=508.342..508.343 rows=1 loops=1)
Group Key: (... 130 columns ...)
-> Seq Scan on gleu2 (cost=0.00..12001.32 rows=132 width=1430) (actual time=0.036..57.088 rows=60000 loops=1)
Planning time: 1.335 ms
Execution time: 508.529 ms
(5 rows)
I now have a Hash Aggregate. Sounds good, but I do have a smaller total cost than the former plan. How is it possible that the planner chooses a plan with a higher cost? (and a much much higher duration, but it can't obviously know that).
Still working on my customer's issue, I found that it worked great before commit 3fc6e2d7f5b652b417fa6937c34de2438d60fa9f (Make the upper part of the planner work by generating and comparing Paths.) and not after. That kind of patch is way out of my league to be honest :) But before that commit, the Hash Aggregate is choosen right out of the box. And after that commit, the Sort is choosen even with its higher cost.
Oh, BTW, yeah, I know. The row estimate is quite off... I'm not asking how to make my query go faster, I'm just wondering why PostgreSQL chooses a higher-cost plan :)
Thanks.
--
Guillaume.