Re: Controlling complexity in queries

Alban Hertroys <haramrae@xxxxxxxxx> · Wed, 14 Dec 2011 23:24:18 +0100

>>> [1] Since this is my current favorite problem, the pathological case is:
>>> 
>>> select questions.id
>>> from questions
>>> join (
>>>  select u.id
>>>  from users as u
>>>  group by u.id
>>> ) as s
>>> on s.id = questions.user_id
>>> where questions.id = 1;
>>> 
>>> With users.id as a primary key, it's obvious that this can return only one
>>> row, but it has to scan the users table to get there.  See the "Subjquery in
>>> a JOIN not getting restricted?" thread on pgsql-performance for Tom's
>>> explanation of why that's a hard problem to solve.
>> 
>> Yeah -- here and there you run into difficult to optimize queries.
>> (For my part, I'd just have converted that to WHERE EXISTS for the
>> semi-join).
> 
> I think I'm about to learn a very important relational-algebra equivalence... could you elaborate?

You could write that as:

select questions.id
from questions as q
where exists (select 1 from users as u where u.id = q.user_id)
and questions.id = 1;

That's basically what you are doing, checking that a user with a given id from the questions table exists in the users table. Writing it as WHERE EXISTS is a matter of "phrasing the question" more accurately, which gives the query planner a hint that for your answer a single hit is sufficient - no need to check whether there are other matches after the first one.

That said, wouldn't a foreign key constraint help you even better? If questions.user_id is required to refer to an existing users.id (by an FK constraint), than the check in the query becomes moot.

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general