On Tue, Jul 5, 2011 at 12:32 AM, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote: > I think its a pretty common requirement and we should be looking to > optimize it if it isn't handled well. I agree; although I wanted to be sure that it is not in fact handled well by some mechanism I haven't seen yet. > The only problem is that there is a few ways of writing that in SQL > and we would probably need to recognise how to transform between query > types to achieve a common form. I think that'd be good for completeness, although I also think that having a 'design pattern' of sorts for dealing with this workload would be totally acceptable for quite some time, even if canonicalization from other forms is technically possible > For example, the post you cite uses a correlated subquery whereas its > possible to write it using an IN subselect. I liked the brevity of DISTINCT ON form, even if semantically it feels a little weird, yet, I think the biggest bang for the buck is looking at the "row_number() = N" form of the problem: I've seen people do this a number of times for different reasons. In my case N = 1, but if I wanted to take an average of a metric of three records or so then one could easily imagine wanting row_number() >= 3, followed by aggregations. I've seen some other real-world use cases fly by where people wanted some N where N != 1, too. When such a predicate is around, one can be sure that no more than N rows are generated per partition, and if there's an appropriate index then some kind of costing can take place to see if it's worth using. Interestingly, this may also be a way to (in a degenerate case) acquire a path to enable skip scan, but last I checked skip scan was problematic somehow for other reasons, which makes me wonder how feasible it'd be to embark on this optimizer enhancement, should it prove necessary. -- fdr -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general