On Mon, Sep 17, 2012 at 4:32 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Mike Christensen <mike@xxxxxxxxxxxxx> writes: >> This definitely makes sense in the context of aggregation, but I'm >> wondering if the same argument applies in the use case originally >> posted: > >> SELECT left(value, 1) as first_letter >> FROM some_table >> WHERE first_letter > 'a'; > >> Obviously, you can write this as: > >> SELECT left(value, 1) as first_letter >> FROM some_table >> WHERE left(value, 1) > 'a'; > >> This would run fine, though you'd be doing a sequential scan on the >> entire table, getting the left most character in each value, then >> filtering those results. This of course assumes you haven't built an >> index on left(value, 1). > >> Thus, in theory the compiler *could* resolve the actual definition of >> first_letter and substitute in that expression on the fly. I'm >> wondering if that concept is actually disallowed by the SQL spec. > > Yes, it is. If you read the spec you'll find that the scope of > visibility of names defined in the SELECT list doesn't include WHERE. > > It's easier to understand why this is if you realize that SQL has a very > clear model of a "pipeline" of query execution. Conceptually, what > happens is: > > 1. Form the cartesian product of the tables listed in FROM (ie, all > combinations of rows). > > 2. Apply the WHERE condition to each row from 1, and drop rows that > don't pass it. > > 3. If there's a GROUP BY, merge the surviving rows into groups. > > 4. If there's aggregate functions, compute those over the rows in > each group. > > 5. If there's a HAVING, filter the grouped rows according to that. > > 6. Evaluate the SELECT expressions for each remaining row. > > 7. If there's an ORDER BY, evaluate those expressions and sort the > remaining rows accordingly. > > (Obviously, implementations try to improve on this - you don't want > to actually form the cartesian product - but that's the conceptual > model.) > > The traditional shortcut of doing "ORDER BY select-column-reference" > is okay according to this world view, because the SELECT expressions > are already available when ORDER BY needs them. However, it's not > sensible to refer to SELECT outputs in WHERE, HAVING, or GROUP BY, > because those steps precede the evaluation of the SELECT expressions. > > This isn't just academic nit-picking either, because the SELECT > expressions might not be valid for rows that don't pass WHERE etc. > Consider > SELECT 1/x AS inverse FROM data WHERE x <> 0; > The implementation *must* apply WHERE before computing the SELECT > expressions, or it'll get zero-divide failures that should not happen. > > Now, having said all that, if you try it you'll find that Postgres > does allow select column references in GROUP BY, using the model > you propose above of copying whatever expression is in SELECT into > GROUP BY. This is, to put it politely, a mistake that we are now > stuck with for backwards-compatibility reasons. It's not spec compliant > and it doesn't fit the language's conceptual model, but it's been that > way for long enough that we're not likely to take it out. We are not, > however, gonna introduce the same mistake elsewhere. > >> Obviously, it would add complexity (and compile overhead) but would be >> somewhat handy to avoid repeating really complicated expressions. >> Perhaps Common Table Expressions are a better way of doing this thing >> anyhow. > > CTEs or sub-selects are a better answer for that. Each sub-select has > its own instance of the conceptual pipeline. Excellent information, Tom! I've been somewhat curious on this behavior for some time now, and it's great to get a detailed answer.. Mike -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general