Underestimated number of output rows with an aggregate function

Philippe BEAUDOIN <phb.emaj@xxxxxxx> · Sun, 15 Oct 2023 10:38:22 +0200

Hi all,

Working on the emaj extension (for the curious ones, 
https://emaj.readthedocs.io/en/latest/ and 
https://github.com/dalibo/emaj), I recently faced a performance problem 
when querying and aggregating data changes. A query with 3 CTE has a O^2 
behavior (https://explain.dalibo.com/plan/1ded242d4ebf3gch#plan). I have 
found a workaround by setting enable_nestloop to FALSE. But this has 
drawbacks. So I want to better understand the issue.

During my analysis, I realized that the output rows estimate of the 
second CTE is really bad, leading to a bad plan for the next CTE.

I reproduced the issue in a very small test case with a simplified 
query. Attached is a shell script and its output.

A simple table is created, filled and analyzed.

The simplified statement is:
 WITH keys AS (
   SELECT c1, min(seq) AS seq FROM perf GROUP BY c1
   )
 SELECT tbl.*
   FROM perf tbl JOIN keys ON (keys.c1 = tbl.c1 AND keys.seq = tbl.seq);

Its plan is:
 Hash Join (cost=958.00..1569.00 rows=1 width=262) (actual 
time=18.516..30.702 rows=10000 loops=1)
   Output: tbl.c1, tbl.seq, tbl.c2
   Inner Unique: true
   Hash Cond: ((tbl.c1 = perf.c1) AND (tbl.seq = (min(perf.seq))))
   Buffers: shared hit=856
   ->  Seq Scan on public.perf tbl  (cost=0.00..548.00 rows=12000 
width=262) (actual time=0.007..2.323 rows=12000 loops=1)
         Output: tbl.c1, tbl.seq, tbl.c2
         Buffers: shared hit=428
   ->  Hash  (cost=808.00..808.00 rows=10000 width=8) (actual 
time=18.480..18.484 rows=10000 loops=1)
         Output: perf.c1, (min(perf.seq))
         Buckets: 16384  Batches: 1  Memory Usage: 519kB
         Buffers: shared hit=428
         ->  HashAggregate  (cost=608.00..708.00 rows=10000 width=8) 
(actual time=10.688..14.321 rows=10000 loops=1)
               Output: perf.c1, min(perf.seq)
               Group Key: perf.c1
               Batches: 1  Memory Usage: 1425kB
               Buffers: shared hit=428
               ->  Seq Scan on public.perf (cost=0.00..548.00 
rows=12000 width=8) (actual time=0.002..2.330 rows=12000 loops=1)
                     Output: perf.c1, perf.seq, perf.c2
                     Buffers: shared hit=428

It globally looks good to me, with 2 sequential scans and a hash join.
But the number of returned rows estimate is always 1, while it actually 
depends on the data content (here 10000).

For the hash join node, the plan shows a "Inner Unique: true" property. 
I wonder if this is normal. It look likes the optimizer doesn't take 
into account the presence of the GROUP BY clause in its estimate.

I reproduce the case with all supported postgres versions.

Thanks by advance for any explanation.
Philippe.
Attachment:
test_case.sh

Description: application/shellscript
select version();
                                                version                                                
-------------------------------------------------------------------------------------------------------
 PostgreSQL 15.4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 row)

\timing on
Timing is on.
\set ON_ERROR_STOP
SET client_min_messages TO WARNING;
SET
Time: 0,197 ms
--> create structures
create table perf (
  c1    integer    not null,
  seq   serial     not null,
  c2    text
);
CREATE TABLE
Time: 10,592 ms
--> populate the table
insert into perf (c1, c2) select i, rpad('2',300) from generate_series (1, 1000*:p_scaleFactor) i;
INSERT 0 10000
Time: 49,318 ms
--> perform changes
insert into perf (c1,c2) select c1, 'updated' from perf where c1 % 5 = 0;
INSERT 0 2000
Time: 34,606 ms
--> vacuum and analyze
select count(*) from perf;
 count 
-------
 12000
(1 row)

Time: 4,693 ms
vacuum analyze perf;
VACUUM
Time: 28,937 ms
--> bad estimate of the number of output rows (always 1)
explain (analyze, buffers, verbose)
 WITH keys AS (
   SELECT c1, min(seq) AS seq
     FROM perf
     GROUP BY c1
   )
   SELECT tbl.*
     FROM perf tbl
          JOIN keys ON (keys.c1 = tbl.c1 AND keys.seq = tbl.seq)
;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=958.00..1569.00 rows=1 width=262) (actual time=14.902..24.311 rows=10000 loops=1)
   Output: tbl.c1, tbl.seq, tbl.c2
   Inner Unique: true
   Hash Cond: ((tbl.c1 = perf.c1) AND (tbl.seq = (min(perf.seq))))
   Buffers: shared hit=856
   ->  Seq Scan on public.perf tbl  (cost=0.00..548.00 rows=12000 width=262) (actual time=0.004..1.813 rows=12000 loops=1)
         Output: tbl.c1, tbl.seq, tbl.c2
         Buffers: shared hit=428
   ->  Hash  (cost=808.00..808.00 rows=10000 width=8) (actual time=14.880..14.881 rows=10000 loops=1)
         Output: perf.c1, (min(perf.seq))
         Buckets: 16384  Batches: 1  Memory Usage: 519kB
         Buffers: shared hit=428
         ->  HashAggregate  (cost=608.00..708.00 rows=10000 width=8) (actual time=8.468..11.305 rows=10000 loops=1)
               Output: perf.c1, min(perf.seq)
               Group Key: perf.c1
               Batches: 1  Memory Usage: 1425kB
               Buffers: shared hit=428
               ->  Seq Scan on public.perf  (cost=0.00..548.00 rows=12000 width=8) (actual time=0.001..1.833 rows=12000 loops=1)
                     Output: perf.c1, perf.seq, perf.c2
                     Buffers: shared hit=428
 Planning:
   Buffers: shared hit=20
 Planning Time: 0.197 ms
 Execution Time: 25.242 ms
(24 rows)

Time: 26,096 ms