1. The city temps table is a toy example, not meant to be realistic :-) 2. Yes, my (Java) algorithm is deterministic ... it will return exactly one row per city, and that will be the row (or strictly, *a* row) containing the highest temp. Temp value ties will break in favour of earlier rows in Guinness Book of Records tradition :-) It's equivalent to a HashAggregate implementation. The following two query plans (from my real schema) illustrate the itch I am trying to scratch .... I want the functionality of the 2nd one, but with the execution plan structure of the first: # explain analyse select a, max(b) from perf_raw_2010_02_23 group by a; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=117953.09..117961.07 rows=639 width=8) (actual time=10861.845..10863.008 rows=1023 loops=1) -> Seq Scan on perf_raw_2010_02_23 (cost=0.00..91572.39 rows=5276139 width=8) (actual time=0.038..4459.222 rows=5276139 loops=1) Total runtime: 10863.856 ms (3 rows) Time: 10864.817 ms # explain analyse select distinct on (a) * from perf_raw_2010_02_23 order by a, b desc ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=1059395.04..1085775.73 rows=639 width=28) (actual time=46011.204..58428.210 rows=1023 loops=1) -> Sort (cost=1059395.04..1072585.39 rows=5276139 width=28) (actual time=46011.200..53561.112 rows=5276139 loops=1) Sort Key: a, b Sort Method: external merge Disk: 247584kB -- actually OS RAM buffers -> Seq Scan on perf_raw_2010_02_23 (cost=0.00..91572.39 rows=5276139 width=28) (actual time=0.047..6491.036 rows=5276139 loops=1) Total runtime: 58516.185 ms (6 rows) Time: 58517.233 ms The only difference between these two is that the second query returns the whole row. The *ratio* in cost between these two plans increases in proportion to log(n) of the table size ... at 5.5m rows its livable, at 500m it's probably not :-! Cheers Dave On Wed, Feb 24, 2010 at 5:12 PM, Richard Huxton <dev@xxxxxxxxxxxx> wrote: > On 24/02/10 22:47, Dave Crooke wrote: >> >> I'd imagine it would be possible to have a query planner optimization >> that would convert Garrett's DISTINCT ON syntax to do what I was >> trying to, by realizing that DISTINCT ON X ... ORDER BY Y DESC is >> going to return the the one row for each X which has the highest value >> of Y, and so use a MAX-structured accumulation instead of a sort. > > Why is there only one row? For city temperatures, that seems unlikely. > > In the event of more than one row does your algorithm give repeatable > results? > > -- > Richard Huxton > Archonet Ltd > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance