Re: window function ordering not working as expected

Tom Lane <tgl@xxxxxxxxxxxxx> · Tue, 17 Feb 2015 19:18:01 -0500

Lonni J Friedman <netllama@xxxxxxxxx> writes:
> I'm interested in seeing:
> * the date for the most recent result
> * test name (identifier)
> * most recent result (decimal value)
> * the worst (lowest decimal value) test result from the past 21 days
> * the date which corresponds with the worst test result from the past 21 days
> * the 2nd worst (2nd lowest decimal value) test result
> ...
> The problem that I'm seeing is in the prv_score column. It should show
> a value of 0.6, which corresponds with 2015-02-13, however instead its
> returning 0.7. I thought by ordering by metrics->>'PT TWBR' I'd always
> be sorting by the scores, and as a result, the lead(metrics->>'PT
> TWBR', 1) would give me the next greatest value of the score. Thus my
> confusion as to why ORDER BY metrics->>'PT TWBR' isn't working as
> expected.

lead() and lag() retrieve values from rows that are N away from the
current row in the specified ordering.  That isn't what you want here
AFAICS.

I think the worst test result would be obtained with
	nth_value(metrics->>'PT TWBR', 1)
which is equivalent to what you used,
	first_value(metrics->>'PT TWBR')
while the 2nd worst result would be obtained with
	nth_value(metrics->>'PT TWBR', 2)

However, "worst" and "2nd worst" with this implementation would mean
"worst and 2nd worst within the partition", which isn't the stated
goal either, at least not with the partition definition you're using.

What you really want for the "worst in last 21 days" is something like

	min(metrics->>'PT TWBR') OVER (
	   PARTITION BY ... that same mess you used ...
	   ORDER BY tstamp
	   RANGE BETWEEN '21 days'::interval PRECEDING AND CURRENT ROW)

However Postgres doesn't implement RANGE x PRECEDING yet.  You could
get "worst in last 21 observations" easily:

	min(metrics->>'PT TWBR') OVER (
	   PARTITION BY ... that mess ...
	   ORDER BY tstamp
	   ROWS BETWEEN 20 PRECEDING AND CURRENT ROW)

and maybe that's close enough.

I do not know an easy way to get "second worst" :-(.  You could build a
user-defined aggregate to produce "second smallest value among the inputs"
and then apply it in the same way as I used min() here.

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general