Re: Picking out the most recent row using a time stamp column

Dave Johansen <davejohansen@xxxxxxxxx> · Mon, 8 Apr 2013 14:10:53 -0700

On Fri, Apr 5, 2013 at 11:40 AM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
>
> On Fri, Apr 5, 2013 at 11:54 AM, Dave Johansen <davejohansen@xxxxxxxxx> wrote:
> > On Sat, Feb 26, 2011 at 2:38 PM, Dave Johansen <davejohansen@xxxxxxxxx>
> > wrote:
> >>
> >> Unfortunately, I'm running 8.3.3 and to my knowledge the windowing stuff
> >> wasn't added til 8.4.
> >> Dave
> >>
> >> On Feb 26, 2011 2:06 PM, "Josh Berkus" <josh@xxxxxxxxxxxx> wrote:
> >> > Dave,
> >> >
> >> > Why not test the windowing version I posted?
> >
> > We finally have moved over to 8.4 and so I just wanted to post the
> > time comparison numbers to show the times on 8.4 as well. This is also
> > a newer data set with ~700k rows and ~4k distinct id_key values.
> >
> > 1) Dependent subquery
> > SELECT a.id_key, a.time_stamp, a.value FROM data AS a WHERE
> > a.time_stamp = (SELECT MAX(time_stamp) FROM data AS b WHERE a.id_key =
> > b.id_key);
> > 8.3.3: Killed it after a few minutes
> > 8.4.13: Killed it after a few minutes
> >
> > 2) Join against temporary table
> > SELECT a.id_key, a.time_stamp, a.value FROM data AS a JOIN (SELECT
> > id_key, MAX(time_stamp) AS max_time_stamp FROM data GROUP BY id_key)
> > AS b WHERE a.id_key = b.id_key AND a.time_stamp = b.max_time_stamp;
> > 8.3.3: 1.4 s
> > 8.4.13: 0.5 s
> >
> > 3) DISTINCT ON:
> > SELECT DISTINCT ON (id_key) id_key, time_stamp, value FROM data ORDER
> > BY id_key, time_stamp DESC;
> > Without Index:
> > 8.3.3: 34.1 s
> > 8.4.13: 98.7 s
> > With Index (data(id_key, time_stamp DESC)):
> > 8.3.3: 3.4 s
> > 8.4.13: 1.3 s
> >
> > 4) Auto-populated table
> > SELECT id_key, time_stamp, value FROM data WHERE rid IN (SELECT rid
> > FROM latestdata);
> > 8.3.3: 0.2 s
> > 8.4.13: 0.06 s
> >
> > 5) Windowing
> > SELECT id_key, time_stamp, value FROM (SELECT id_key, time_stamp,
> > value, row_number() OVER (PARTITION BY id_key ORDER BY time_stamp
> > DESC) AS ranking FROM data) AS a WHERE ranking=1;
> > 8.3.3: N/A
> > 8.4.13: 1.6 s
>
> I would also test:
>
> *) EXISTS()
>
> SELECT a.id_key, a.time_stamp, a.value FROM data
> WHERE NOT EXISTS
> (
>   SELECT 1 FROM data b
>   WHERE
>     a.id_key = b.id_key
>     and b.time_stamp > a.time_stamp
> );

I tried this and it was slow:
8.3.3: 674.4 s
8.4.13: 40.4 s

>
> *) custom aggregate (this will not be the fastest option but is a good
> technique to know -- it can be a real life saver when selection
> criteria is complex)
>
> CREATE FUNCTION agg_latest_data(data, data) returns data AS
> $$
>   SELECT CASE WHEN $1 > $2 THEN $1 ELSE $2 END;
> $$ LANGUAGE SQL IMMUTABLE;
>
> CREATE AGGREGATE latest_data (
>   SFUNC=agg_latest_data,
>   STYPE=data
> );
>
> SELECT latest_data(d) FROM data d group by d.id_key;
>
> the above returns the composite, not the fields, but that can be worked around.

My real table actually returns/needs all the values from the row so I
didn't feel like messing with aggregate stuff.

Thanks,
Dave

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance