Re: Any better plan for this query?..

Simon Riggs <simon@xxxxxxxxxxxxxxx> · Wed, 20 May 2009 09:11:37 +0100

On Tue, 2009-05-19 at 23:54 -0400, Robert Haas wrote:

> I don't think it's a good idea to write off the idea of implementing
> this optimization at some point.  I see a lot of queries that join one
> fairly large table against a whole bunch of little tables, and then
> sorting the results by a column that is indexed in the big table.  

Agreed it's a common use case.

> The
> optimizer handles this by sequentially scanning the big table, hash
> joining against all of the little tables, and then sorting the output,
> which is pretty silly (given that all of the tables fit in RAM and are
> in fact actually cached there).  If there is a LIMIT clause, then it
> might instead index-scan the big table, do the hash joins, and then
> sort the already-ordered results.  This is better because at least
> we're not sorting the entire table unnecessarily but it's still poor.

The Hash node is fully executed before we start pulling rows through the
Hash Join node. So the Hash Join node will know at execution time
whether or not it will continue to maintain sorted order. So we put the
Sort node into the plan, then the Sort node can just ask the Hash Join
at execution time whether it should perform a sort or just pass rows
through (act as a no-op).

The cost of the Sort node can either be zero, or pro-rated down from the
normal cost based upon what we think the probability is of going
multi-batch, which would vary by work_mem available.

-- 
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance