Wrong rows count estimation in query with simple UNION ALL leads to drammatically slow plan

Michael Kolomeitsev <mkolomeitsev@xxxxxxxxx> · Mon, 6 Jan 2014 12:50:47 +0700

The problem was described here earlier but there is no answers unfortunately: http://www.postgresql.org/message-id/1387780366.910542228@xxxxxxxxxxxxxx
It looks like defect.

CREATE TABLE t1 (c1 INTEGER, id INTEGER PRIMARY KEY);
INSERT INTO t1 (c1, id) SELECT b, b FROM generate_series(1, 1000000) a (b);
REINDEX TABLE t1;
ANALYZE t1;

CREATE TABLE t2 (c1 INTEGER, id INTEGER PRIMARY KEY);
INSERT INTO t2 (c1, id) SELECT b, b FROM generate_series(1, 1000000) a (b);
REINDEX TABLE t2;
ANALYZE t2;

CREATE TEMP TABLE t3 (ref_id INTEGER);
INSERT INTO t3 (ref_id) VALUES (333333), (666666);
ANALYZE t3;

test=> EXPLAIN (ANALYZE) SELECT * FROM (SELECT * FROM t1 UNION ALL SELECT * FROM t2) t INNER JOIN t3 ON t3.ref_id = t.id;

                                                       QUERY PLAN                                   

----------------------------------------------------------------------------------------------------
--------------------
 Nested Loop  (cost=0.42..34.84 rows=20000 width=12) (actual time=0.046..0.104 rows=4 loops=1)
   ->  Seq Scan on t3  (cost=0.00..1.02 rows=2 width=4) (actual time=0.008..0.009 rows=2 loops=1)
   ->  Append  (cost=0.42..16.89 rows=2 width=8) (actual time=0.023..0.042 rows=2 loops=2)
         ->  Index Scan using t1_pkey on t1  (cost=0.42..8.45 rows=1 width=8) (actual time=0.020..0.022 rows=1 loops=2)
               Index Cond: (id = t3.ref_id)
         ->  Index Scan using t2_pkey on t2  (cost=0.42..8.45 rows=1 width=8) (actual time=0.015..0.016 rows=1 loops=2)
               Index Cond: (id = t3.ref_id)
 Total runtime: 0.184 ms
(8 rows)

This plan is perfect. But the rows estimation is not: 20000 vs 4.
As I can see Pg is able to do correct rows estimation: inner append: rows = 2, outer seq scan: rows = 2. And nested loop has to know that one is able to produce 2 * 2 = 4 rows max.
Moreover the cost estimation is _correct_! It is corresponded to 'rows=4'.

Why it is important to make correct row estimation? 'Cause it does matter in more complex query.

Let's join another big table in that query:

CREATE TABLE links (c1 INTEGER PRIMARY KEY, descr TEXT);
INSERT INTO links (c1, descr) SELECT b, '2' FROM generate_series(1, 100000) a (b);
REINDEX TABLE links;
ANALYZE links;

test=> EXPLAIN (ANALYZE) SELECT * FROM (SELECT * FROM t1 UNION ALL SELECT * FROM t2) t INNER JOIN t3 ON t3.ref_id = t.id INNER JOIN links l ON (t.c1 = l.c1);
                                                          QUERY PLAN                                

----------------------------------------------------------------------------------------------------
--------------------------
 Hash Join  (cost=2693.43..3127.84 rows=20000 width=18) (actual time=33.619..33.619 rows=0 loops=1)
   Hash Cond: (t1.c1 = l.c1)
   ->  Nested Loop  (cost=0.42..34.84 rows=20000 width=12) (actual time=0.038..0.078 rows=4 loops=1)
         ->  Seq Scan on t3  (cost=0.00..1.02 rows=2 width=4) (actual time=0.006..0.007 rows=2 loops=1)
         ->  Append  (cost=0.42..16.89 rows=2 width=8) (actual time=0.017..0.029 rows=2 loops=2)
               ->  Index Scan using t1_pkey on t1  (cost=0.42..8.45 rows=1 width=8) (actual time=0.015..0.017 rows=1 loops=2)
                     Index Cond: (id = t3.ref_id)
               ->  Index Scan using t2_pkey on t2  (cost=0.42..8.45 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=2)
                     Index Cond: (id = t3.ref_id)
   ->  Hash  (cost=1443.00..1443.00 rows=100000 width=6) (actual time=33.479..33.479 rows=100000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 3711kB
         ->  Seq Scan on links l  (cost=0.00..1443.00 rows=100000 width=6) (actual time=0.017..14.853 rows=100000 loops=1)
 Total runtime: 33.716 ms
(13 rows)

Planner thinks there'll be 20000 rows when join is performed between "t" and "t3". And that's why it makes a decision to use hash join with "links" table.

Let's prove it:

CREATE OR REPLACE FUNCTION public.f1()
 RETURNS SETOF integer
 LANGUAGE plpgsql
 ROWS 20000
AS $function$
BEGIN
RETURN QUERY EXECUTE 'SELECT t.c1 FROM (SELECT * FROM t1 UNION ALL SELECT * FROM t2) t INNER JOIN t3 ON t3.ref_id = t.id';
END;
$function$

test=> explain select * from f1() t(c1) INNER JOIN links l ON (t.c1 = l.c1);
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Hash Join  (cost=2693.25..3293.25 rows=20000 width=10)
   Hash Cond: (t.c1 = l.c1)
   ->  Function Scan on f1 t  (cost=0.25..200.25 rows=20000 width=4)
   ->  Hash  (cost=1443.00..1443.00 rows=100000 width=6)
         ->  Seq Scan on links l  (cost=0.00..1443.00 rows=100000 width=6)
(5 rows)

The same "defect" plan.

test=> ALTER FUNCTION f1() ROWS 4;
ALTER FUNCTION
test=> explain select * from f1() t(c1) INNER JOIN links l ON (t.c1 = l.c1);
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Nested Loop  (cost=0.54..33.58 rows=4 width=10)
   ->  Function Scan on f1 t  (cost=0.25..0.29 rows=4 width=4)
   ->  Index Scan using links_pkey on links l  (cost=0.29..8.31 rows=1 width=6)
         Index Cond: (c1 = t.c1)
(4 rows)

The correct/perfect plan.

In real life I have bigger "links" table and wrong plan slows execution significantly.
I found several workarounds. And it is not a problem anymore for me.

I just want to report this "strange thing".

I tried to look into source code, found some interesting places there but I think it is useless: Pg developers know the situation much better than me.