Re: Extremely slow HashAggregate in simple UNION query

Jeff Janes <jeff.janes@xxxxxxxxx> · Wed, 21 Aug 2019 14:26:33 -0400

On Tue, Aug 20, 2019 at 11:12 AM Felix Geisendörfer <felix@xxxxxxxxxx> wrote:
 ...

[1] My actual query had bad estimates for other reasons (GIN Index), but that's another story. The query above was of course deliberately designed to have bad estimates.

As noted elsewhere, v12 thwarts your attempts to deliberately design the bad estimates.  You can still get them, you just have to work a bit harder at it:

CREATE FUNCTION j (bigint, bigint) returns setof bigint as $$ select generate_series($1,$2) $$ rows 1000 language sql;

EXPLAIN ANALYZE
SELECT * FROM j(1, 1) a, j(1, 1) b                         
UNION
SELECT * FROM j(1, 1) a, j(1, 1) b;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=80021.00..100021.00 rows=2000000 width=16) (actual time=11.332..13.241 rows=1 loops=1)
   Group Key: a.a, b.b
   ->  Append  (cost=0.50..70021.00 rows=2000000 width=16) (actual time=0.118..0.163 rows=2 loops=1)
         ->  Nested Loop  (cost=0.50..20010.50 rows=1000000 width=16) (actual time=0.117..0.118 rows=1 loops=1)
               ->  Function Scan on j a  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.087..0.088 rows=1 loops=1)
               ->  Function Scan on j b  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.027..0.027 rows=1 loops=1)
         ->  Nested Loop  (cost=0.50..20010.50 rows=1000000 width=16) (actual time=0.044..0.044 rows=1 loops=1)
               ->  Function Scan on j a_1  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.022..0.022 rows=1 loops=1)
               ->  Function Scan on j b_1  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.020..0.021 rows=1 loops=1)
 Planning Time: 0.085 ms
 Execution Time: 69.277 ms
(11 rows)

But the same advance in v12 which makes it harder to fool with your test case also opens the possibility of fixing your real case.

I've made an extension which has a function which always returns true, but lies about how often it is expected to return true. See the attachment.  With that, you can fine-tune the planner.

CREATE EXTENSION pg_selectivities ;

EXPLAIN ANALYZE
SELECT * FROM j(1, 1) a, j(1, 1) b where pg_always(0.00001)
UNION
SELECT * FROM j(1, 1) a, j(1, 1) b where pg_always(0.00001);
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=45021.40..45021.60 rows=20 width=16) (actual time=0.226..0.227 rows=1 loops=1)
   Group Key: a.a, b.b
   ->  Append  (cost=0.50..45021.30 rows=20 width=16) (actual time=0.105..0.220 rows=2 loops=1)
         ->  Nested Loop  (cost=0.50..22510.50 rows=10 width=16) (actual time=0.104..0.105 rows=1 loops=1)
               Join Filter: pg_always('1e-05'::double precision)
               ->  Function Scan on j a  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.066..0.066 rows=1 loops=1)
               ->  Function Scan on j b  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.035..0.035 rows=1 loops=1)
         ->  Nested Loop  (cost=0.50..22510.50 rows=10 width=16) (actual time=0.112..0.113 rows=1 loops=1)
               Join Filter: pg_always('1e-05'::double precision)
               ->  Function Scan on j a_1  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.077..0.077 rows=1 loops=1)
               ->  Function Scan on j b_1  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.034..0.034 rows=1 loops=1)
 Planning Time: 0.139 ms
 Execution Time: 0.281 ms

Cheers,

Jeff

Attachment:
pg_selectivities.patch

Description: Binary data