Re: Unclamped row estimates whith OR-ed subplans

Laurenz Albe <laurenz.albe@xxxxxxxxxxx> · Fri, 19 Jun 2020 17:55:27 +0200

On Fri, 2020-06-19 at 17:12 +0200, Benjamin Coutu wrote:
> please consider the following SQL query:
> 
> SELECT * FROM "transactions" WHERE 
>         "account" IN (SELECT "ID" FROM "accounts" WHERE "name" ~~* '%test%') OR
>         "contract" IN (SELECT "ID" FROM "contracts" WHERE "name" ~~* '%test%')
> 
> This yields the following plan on Postgres 11:
> 
> Seq Scan on transactions  (cost=67.21..171458.03 rows=1301316 width=1206)
>   Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
>   SubPlan 1
>     ->  Bitmap Heap Scan on accounts  (cost=33.36..61.16 rows=46 width=4)
>           Recheck Cond: ((name)::text ~~* '%test%'::text)
>           ->  Bitmap Index Scan on s_accounts  (cost=0.00..33.35 rows=46 width=0)
>                 Index Cond: ((name)::text ~~* '%test%'::text)
>   SubPlan 2
>     ->  Seq Scan on contracts  (cost=0.00..5.93 rows=5 width=4)
>           Filter: ((name)::text ~~* '%test%'::text)
> 
> So the where clause of this query has just two subplans OR-ed together, one is estimated to yield 46 rows and one is estimated to yield 5 rows.
> I'd expect the total rows for the seqscan to be estimated at 46 then, following the logic that rows_seqscan = max(rows_subplan1, rows_subplan2). As you can see, the optimizer estimates a whopping
> 1301316 rows instead.
> 
> I am absolutely aware that those are hashed sub plans below a seqscan and that Postgres therefore has to scan all tuples of the table. But the problem is that upper nodes (which are excluded from
> this example for simplicity) think they will receive 1301316 rows from the seqscan, when in fact they will probably only see a hand full, which the planner could have (easily?) deduced by taking the
> greater of the two subplan row estimates.
> 
> What am I missing, or is this perhaps a shortfall of the planner?

The subplans are executed *fpr each row* found in "transactions",
and the estimate on the subplans is *per execution".

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com