On Tue, May 19, 2009 at 6:49 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Simon Riggs <simon@xxxxxxxxxxxxxxx> writes: >> On Tue, 2009-05-19 at 08:58 -0400, Tom Lane wrote: >>> Nonsense. The planner might think some other plan is cheaper, but >>> it definitely knows how to do this, and has since at least 8.1. > >> Please look at Dimitri's plan. If it can remove the pointless sort, why >> does it not do so? > > I haven't followed the whole thread, but the plan in the original post > is for a hash join. The planner does not trust a hash join to preserve > the order of its left input, because of possible batching. See the > discussion a couple of months ago where we considered allowing the > planner to disable batching so it *could* assume order preservation, and > decided the risk of hashtable bloat was too great. Hmm, my recollection of that conversation was that we decided that we should have the planner tell the executor whether or not we are relying on it to produce sorted output. We set this flag only when the hash join is expected to fit comfortably within one batch (that is, we allow a safety margin). If this flag is set and a hash join unexpectedly goes multi-batch, then we perform a final merge pass (a la merge sort) before returning any results. I don't think it's a good idea to write off the idea of implementing this optimization at some point. I see a lot of queries that join one fairly large table against a whole bunch of little tables, and then sorting the results by a column that is indexed in the big table. The optimizer handles this by sequentially scanning the big table, hash joining against all of the little tables, and then sorting the output, which is pretty silly (given that all of the tables fit in RAM and are in fact actually cached there). If there is a LIMIT clause, then it might instead index-scan the big table, do the hash joins, and then sort the already-ordered results. This is better because at least we're not sorting the entire table unnecessarily but it's still poor. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance