On Fri, 2020-06-19 at 17:12 +0200, Benjamin Coutu wrote: > please consider the following SQL query: > > SELECT * FROM "transactions" WHERE > "account" IN (SELECT "ID" FROM "accounts" WHERE "name" ~~* '%test%') OR > "contract" IN (SELECT "ID" FROM "contracts" WHERE "name" ~~* '%test%') > > This yields the following plan on Postgres 11: > > Seq Scan on transactions (cost=67.21..171458.03 rows=1301316 width=1206) > Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2)) > SubPlan 1 > -> Bitmap Heap Scan on accounts (cost=33.36..61.16 rows=46 width=4) > Recheck Cond: ((name)::text ~~* '%test%'::text) > -> Bitmap Index Scan on s_accounts (cost=0.00..33.35 rows=46 width=0) > Index Cond: ((name)::text ~~* '%test%'::text) > SubPlan 2 > -> Seq Scan on contracts (cost=0.00..5.93 rows=5 width=4) > Filter: ((name)::text ~~* '%test%'::text) > > So the where clause of this query has just two subplans OR-ed together, one is estimated to yield 46 rows and one is estimated to yield 5 rows. > I'd expect the total rows for the seqscan to be estimated at 46 then, following the logic that rows_seqscan = max(rows_subplan1, rows_subplan2). As you can see, the optimizer estimates a whopping > 1301316 rows instead. > > I am absolutely aware that those are hashed sub plans below a seqscan and that Postgres therefore has to scan all tuples of the table. But the problem is that upper nodes (which are excluded from > this example for simplicity) think they will receive 1301316 rows from the seqscan, when in fact they will probably only see a hand full, which the planner could have (easily?) deduced by taking the > greater of the two subplan row estimates. > > What am I missing, or is this perhaps a shortfall of the planner? The subplans are executed *fpr each row* found in "transactions", and the estimate on the subplans is *per execution". Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com