"Luke Lonergan" <llonergan@xxxxxxxxxxxxx> writes: > Chad, > > On 1/30/07 6:13 AM, "Chad Wagner" <chad.wagner@xxxxxxxxx> wrote: > > > Sounds like an opportunity to implement a "Sort Unique" (sort of like a hash, > > I guess), there is no need to push 3M rows through a sort algorithm to only > > shave it down to 1848 unique records. > > > > I am assuming this optimization just isn't implemented in PostgreSQL? > > Not that it helps Igor, but we've implemented single pass sort/unique, > grouping and limit optimizations and it speeds things up to a single seqscan > over the data, from 2-5 times faster than a typical external sort. Fwiw I also implemented this last September when I worked on the LIMIT/SORT case. It's blocked on the same issue that that patch is: how do we communicate the info that only unique records are needed down the plan to the sort node? Note that it's not just a boolean flag that we can include like the random access flag: The Unique node could in theory have a different key than the sort node. Coming at this fresh now a few months later I'm thinking instead of trying to put this intelligence into the individual nodes, an optimizer pass could run through the tree looking for Sort nodes that can have this bit tweaked. That doesn't quite help the LIMIT/SORT case because the limit isn't calculated until run-time but perhaps it's possible to finesse that issue by providing either the Limit or Sort node with pointers to other nodes. (Incidentally I'm not sure where 2-5x comes from. It's entirely dependant on your data distribution. It's not hard to come up with distributions where it's 1000x as fast and others where there's no speed difference.) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com