On Thu, Feb 24, 2011 at 1:55 PM, Dave Crooke <dcrooke@xxxxxxxxx> wrote: > Hi foks > > This is an old chestnut which I've found a number of online threads for, and > never seen a clever answer to. It seems a common enough idiom that there > might be some slicker way to do it, so I thought I might inquire with this > august group if such a clever answer exists .... > > Consider the following table > > create table data > (id_key int, > time_stamp timestamp without time zone, > value double precision); > > create unique index data_idx on data (id_key, time_stamp); > > with around 1m rows, with 3500 or so distinct values of id_key. > > I need to find the most recent value for each distinct value of id_key. > There is no elegant (that I know of) syntax for this, and there are two ways > I've typically seen it done: > > 1. Use a dependent subquery to find the most recent time stamp, i.e. > > select > a.id_key, a.time_stamp, a.value > from > data a > where > a.time_stamp= > (select max(time_stamp) > from data b > where a.id_key=b.id_key) > > 2. Define a temporary table / view with the most recent time stamp for each > key, and join against it: > > select > a.id_key, a.time_stamp, a.value > from > data a, > (select id_key, max(time_stamp) as mts > from data group by id_key) b > where > a.id_key=b.id_key and a.time_stamp=b.mts > > I've found that for my data set, PG 8.4.2 selects the "obvious" / "do it as > written" plan in each case, and that method 2. is much quicker (2.6 sec vs. > 2 min on my laptop) .... > > Is there a more elegant way to write this, perhaps using PG-specific > extensions? one pg specific method that a lot of people overlook for this sort of problem is custom aggregates. create or replace function maxfoo(foo, foo) returns foo as $$ select case when $1.t > $2.t then $1 else $2 end; $$ language sql immutable; create aggregate aggfoo(foo) ( sfunc=maxfoo, stype=foo ); create table foo(id int, t timestamptz default now()); insert into foo values (1); insert into foo values (1); select (f).* from (select aggfoo(foo) as f from foo group by id) q; postgres=# select (f).* from (select aggfoo(foo) as f from foo group by id) q; id | t ----+---------------------------- 1 | 2011-02-24 14:01:20.051-06 (1 row) where this approach can be useful is when you have a very complicated aggregation condition that can be awkward to express in a join. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance