Re: Picking out the most recent row using a time stamp column

Merlin Moncure <mmoncure@xxxxxxxxx> · Thu, 24 Feb 2011 14:11:29 -0600

On Thu, Feb 24, 2011 at 1:55 PM, Dave Crooke <dcrooke@xxxxxxxxx> wrote:
> Hi foks
>
> This is an old chestnut which I've found a number of online threads for, and
> never seen a clever answer to. It seems a common enough idiom that there
> might be some slicker way to do it, so I thought I might inquire with this
> august group if such a clever answer exists ....
>
> Consider the following table
>
> create table data
>    (id_key int,
>     time_stamp timestamp without time zone,
>     value double precision);
>
> create unique index data_idx on data (id_key, time_stamp);
>
> with around 1m rows, with 3500 or so distinct values of id_key.
>
> I need to find the most recent value for each distinct value of id_key.
> There is no elegant (that I know of) syntax for this, and there are two ways
> I've typically seen it done:
>
> 1. Use a dependent subquery to find the most recent time stamp, i.e.
>
> select
>    a.id_key, a.time_stamp, a.value
> from
>    data a
> where
>   a.time_stamp=
>      (select max(time_stamp)
>       from data b
>       where a.id_key=b.id_key)
>
> 2. Define a temporary table / view with the most recent time stamp for each
> key, and join against it:
>
> select
>    a.id_key, a.time_stamp, a.value
> from
>    data a,
>    (select id_key, max(time_stamp) as mts
>     from data group by id_key) b
> where
>    a.id_key=b.id_key and a.time_stamp=b.mts
>
> I've found that for my data set, PG 8.4.2 selects the "obvious" / "do it as
> written" plan in each case, and that method 2. is much quicker (2.6 sec vs.
> 2 min on my laptop) ....
>
> Is there a more elegant way to write this, perhaps using PG-specific
> extensions?

one pg specific method that a lot of people overlook for this sort of
problem is custom aggregates.

create or replace function maxfoo(foo, foo) returns foo as
$$
  select case when $1.t > $2.t then $1 else $2 end;
$$ language sql immutable;

create aggregate aggfoo(foo)
(
  sfunc=maxfoo,
  stype=foo
);

create table foo(id int, t timestamptz default now());
insert into foo values (1);
insert into foo values (1);

select (f).* from (select aggfoo(foo) as f from foo group by id) q;

postgres=# select (f).* from (select aggfoo(foo) as f from foo group by id) q;
 id |             t
----+----------------------------
  1 | 2011-02-24 14:01:20.051-06
(1 row)

where this approach can be useful is when you have a very complicated
aggregation condition that can be awkward to express in a join.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance