Re: Subquery in a JOIN not getting restricted?

Jay Levitt <jay.levitt@xxxxxxxxx> · Thu, 10 Nov 2011 08:52:23 -0500

Kevin Grittner wrote:
Merlin Moncure<mmoncure@xxxxxxxxx>  wrote:

Well, this may not fit the OP's 'real' query

Right, if I recall correctly, the OP said it was simplified down as
far as it could be and still have the issue show.

but the inner subquery is probably better written as a semi-join
(WHERE EXISTS).

Kevin's right.  The real query involves several SQL and PL/pgsql functions 
(all now inlineable), custom aggregates, a union or two and a small coyote. 
 I could post it, but that feels like "Please write my code for me". 
Still, if you really want to...

Meanwhile, it's good for me to learn how the planner sees my queries and how 
I can best state them.  I assume this is me not understanding something 
about restrictions across group-by nodes.

If the query was more like

select questions.id
from questions
join (
  select sum(u.id)
  from users as u
  group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;

would you no longer be surprised that it scanned all user rows?  I.E. is the 
"group by" a red herring, which usually wouldn't be present without an 
aggregate, and the real problem is that the planner can't restrict aggregates?

This comment in planagg.c may be relevant; I'm not doing min/max, but is it 
still true that GROUP BY always looks at all the rows, period?

void
preprocess_minmax_aggregates(PlannerInfo *root, List *tlist)
...
/* We don't handle GROUP BY or windowing, because our current
* implementations of grouping require looking at all the rows anyway,
*/

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance