On Tue, 2009-05-19 at 23:54 -0400, Robert Haas wrote: > I don't think it's a good idea to write off the idea of implementing > this optimization at some point. I see a lot of queries that join one > fairly large table against a whole bunch of little tables, and then > sorting the results by a column that is indexed in the big table. Agreed it's a common use case. > The > optimizer handles this by sequentially scanning the big table, hash > joining against all of the little tables, and then sorting the output, > which is pretty silly (given that all of the tables fit in RAM and are > in fact actually cached there). If there is a LIMIT clause, then it > might instead index-scan the big table, do the hash joins, and then > sort the already-ordered results. This is better because at least > we're not sorting the entire table unnecessarily but it's still poor. The Hash node is fully executed before we start pulling rows through the Hash Join node. So the Hash Join node will know at execution time whether or not it will continue to maintain sorted order. So we put the Sort node into the plan, then the Sort node can just ask the Hash Join at execution time whether it should perform a sort or just pass rows through (act as a no-op). The cost of the Sort node can either be zero, or pro-rated down from the normal cost based upon what we think the probability is of going multi-batch, which would vary by work_mem available. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance