Re: Extracting superlatives - SQL design philosophy

"Julien Theulier" <julien@xxxxxxxxxxxxxxxxxx> · Thu, 25 Feb 2010 10:20:45 +0100

Hi all,

Just for your information, and this is not related to PG
directly:

Teradata provides a “qualify” syntax which works as
a filtering condition on a windowed function result. This is the only DB
allowing this direct filtering on windowed functions, from what I know.

So, as an example, the query you ask for becomes very easy on
this database:

select 

city,
temp, date 

from bar 

qualify
row_number() over (partition by city order by temp desc)=1

This is very practical indeed (you can mix it also with
classical where/having/group by syntaxes).

On postgres, you may get the same result using an inner query
(sorry, I can’t test it for now) such as:

select 

city,
temp, date 

from

(select
city, temp, date, row_number() over (partition by city order by temp desc) as
nr 

from
bar ) a1

where nr=1

Julien Theulier

De :
pgsql-performance-owner@xxxxxxxxxxxxxx
[mailto:pgsql-performance-owner@xxxxxxxxxxxxxx] De la part de Mose

Envoyé : mercredi 24 février 2010 22:50

À : Dave Crooke

Cc : pgsql-performance

Objet : Re:  Extracting superlatives - SQL design
philosophy

Can you try using window functions?

Something like this:

select distinct

  city,

  first_value(temp) over w as
max_temp,

  first_value(date) over w as
max_temp_date

 from

  cities

  window w as (partition by city
order by temp desc)

http://www.postgresql.org/docs/current/static/tutorial-window.html

- Mose

On Wed, Feb 24, 2010 at 1:31 PM, Dave
Crooke <dcrooke@xxxxxxxxx> wrote:

This is a generic SQL issue and not PG
specific, but I'd like to get

an opinion from this list.

Consider the following data:

# \d bar

Table "public.bar"

 Column
|           
Type             |
Modifiers

--------+-----------------------------+-----------

 city   | character varying(255)      |

 temp   |
integer                    
|

 date   | timestamp without time zone |

# select * from bar order by city, date;

  city    | temp |        date

-----------+------+---------------------

 Austin    |   75 | 2010-02-21 15:00:00

 Austin    |   35 | 2010-02-23 15:00:00

 Edinburgh |   42 | 2010-02-23 15:00:00

 New York  |   56 | 2010-02-23 15:00:00

 New York  |   78 | 2010-06-23 15:00:00

(5 rows)

If you want the highest recorded temperature for a city, that's easy

to do, since the selection criteria works on the same column that we

are extracing:

# select city, max(temp) from bar group by city order by 1;

  city    | max

-----------+-----

 Austin    |  75

 Edinburgh |  42

 New York  |  78

(3 rows)

However there is (AFAIK) no simple way in plain SQL to write a query

that performs such an aggregation where the aggregation criteria is on

one column and you want to return another, e.g. adding the the *date

of* that highest temperature to the output above, or doing a query to

get the most recent temperature reading for each city.

What I'd like to do is something like the below (and I'm inventing

mock syntax here, the following is not valid SQL):

-- Ugly implicit syntax but no worse than an Oracle outer join ;-)

select city, temp, date from bar where date=max(date) group by city,

temp order by city;

or perhaps

-- More explicit

select aggregate_using(max(date), city, temp, date) from bar group by

city, temp order by city;

Both of the above, if they existed, would be a single data access

followed by and sort-merge.

The only way I know how to do it involves doing two accesses to the data, e.g.

# select city, temp, date from bar a where date=(select max(b.date)

from bar b where a.city=b.city) order by 1;

  city    | temp |        date

-----------+------+---------------------

 Austin    |   35 | 2010-02-23 15:00:00

 Edinburgh |   42 | 2010-02-23 15:00:00

 New York  |   78 | 2010-06-23 15:00:00

(3 rows)

# explain select * from bar a where date=(select max(b.date) from bar

b where a.city=b.city) order by 1;

         QUERY PLAN

--------------------------------------------------------------------------

 Sort  (cost=1658.86..1658.87 rows=1 width=528)

  Sort Key: a.city

  ->  Seq Scan on bar a  (cost=0.00..1658.85 rows=1
width=528)

        Filter: (date = (subplan))

        SubPlan

          ->  Aggregate
 (cost=11.76..11.77 rows=1 width=8)

                ->  Seq Scan on
bar b  (cost=0.00..11.75 rows=1

width=8)     -- would be an index lookup in a real scenario

Filter: (($0)::text = (city)::text)

(8 rows)

--

Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-performance