Re: Controlling complexity in queries

Jay Levitt <jay.levitt@xxxxxxxxx> · Tue, 13 Dec 2011 17:27:36 -0500

Merlin Moncure wrote:
Breaking your large queries into functions OTOH can make significant
changes to the plan, often to the worse.

As an end-user, I think this is an area where PostgreSQL could really stand 
out (that and the moon launch).  In Rails-land, you don't have The DBA that 
writes queries. You have a developer and an ORM, and when they step outside 
that ORM to do the cool things SQL can do, all their DRY habits fall apart, 
because it's 1990 again and you can either write clear code or fast code but 
not both.

Postgres has made huge inroads with its optimizer.  A few years ago, for web 
startups picking their relational DB, the performant answer was "MySQL for 
fast, simple, mostly-read queries, PostgreSQL when you grow up."  That's 
become "Probably Postgres", and with 9.2's COUNT(*), it's become "Definitely 
Postgres".

But having to write one big query for performance feels exactly like having 
to write one big C function with unrolled loops.  I'm currently taking a 
well-factored, function-based query and turning it INTO what Robert James is 
trying to get OUT of: a monolithic query.

In the end, the performance of your queries is going to be directly
related to how well you map the problem into relational logic

It's not just that, though; it's quite possible to think relationally and 
still fall down. There are plenty of cases where the human eye can see that 
a modular function can be inlined, but the optimizer can't.  I have a 
pathological case: a query against a database with just a few thousand users 
takes 1.5 seconds on fast hardware, because it ends up scanning a cartesian 
product to get 16 rows, even before you get to the nested loops. In fact, 
most of the time the optimizer does a great job of inlining all my 
set-returning functions, once 9.0.6/9.1.2 rolled out.

I've seen at least three equally ominous pieces that would have to happen to 
allow DRY, composable SQL:

1. Optional optimization of non-recursive WITH
2. Optional pushdown of WHERE clauses into GROUP BY[1]
3. LATERAL

AFAIK, none of these are on anyone's short-term to-do list, and I'm sure 
none are easy.

[1] Since this is my current favorite problem, the pathological case is:

select questions.id
from questions
join (
  select u.id
  from users as u
  group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;

With users.id as a primary key, it's obvious that this can return only one 
row, but it has to scan the users table to get there.  See the "Subjquery in 
a JOIN not getting restricted?" thread on pgsql-performance for Tom's 
explanation of why that's a hard problem to solve.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general