I wrote: > Anywy, your point about the sort being redundant is a good one, and > offhand I'd have expected PG to catch that; I'll have to look into > why it didn't. But that's not going to explain a 10x speed > difference, because the sort isn't 90% of the runtime. I dug into this using some made-up test data, and was able to reproduce the plan you got after changing the order of the pkey index columns to (regn_no, transfer_date, flock_no) ... are you sure you quoted that accurately before? I found a couple of minor planner problems, which I've repaired in CVS HEAD. You might consider using TEXT columns instead of VARCHAR(n), because the only bug that actually seemed to change the chosen plan involved the planner getting confused by the difference between varchar_var and varchar_var::text (which is what gets generated for sorting purposes because varchar doesn't have a separate sort operator). There's a more interesting issue, which I'm afraid we do not have time to fix for PG 8.2. The crux of the matter is that given SELECT ... FROM SHEEP_FLOCK f1 JOIN (SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date FROM SHEEP_FLOCK f GROUP BY f.regn_no) f2 ON f1.regn_no = f2.regn_no AND f1.transfer_date = f2.last_xfer_date if there is an index on (regn_no, transfer_date) then the planner could in principle do a double-column merge join between an indexscan on this index and the output of a GroupAggregate implementation of the subquery. The GroupAggregate plan would in fact only be sorting on regn_no, so it's not immediately obvious why this is OK. The reason is that there is only one MAX() value for any particular regn_no, and so the sort condition that the last_xfer_date values be in order for any one value of regn_no is vacuous. We could consider the subquery's output to be sorted by *any* list of the form "regn_no, other-stuff". The planner's notion of matching pathkey lists to determine sortedness is not at all capable of dealing with this. After a little bit of thought I'm tempted to propose that we add a concept that a particular pathkey list is "unique", meaning that it is known to include a unique key for the data. Such a key would match, for sortedness purposes, any requested sort ordering consisting of its columns followed by others. In the above example, we would know that a GROUP BY implemented by GroupAggregate yields an output for which the grouping columns are a unique sort key. I imagine this concept is already known in the database research literature; anyone recognize it and know a standard name for it? What'd be really interesting is to know if MSSQL and Paradox are using this concept to optimize their plans for Peter's query. Can someone with a copy of MSSQL try this test case and see what it reports as the plan? BTW, I used this to generate some COPY data I could load into Peter's example table: perl -e 'for ($s = 1; $s < 32000; $s++) { $f=int($s/100); print "$s\t$f\t1\t0\n"; print "$s\t$f\t2\t0\n"; }' >sheep.data I changed the date columns to integers rather than bother to make up valid dates. I think only the regn_no and flock_no statistics matter to the planner for this particular query --- as you can see, I arranged for 2 entries per sheep and 100 sheep per flock, which is in the general ballpark of what Peter mentioned as his stats. regards, tom lane