Efficiently obtaining (any) one record per group.

Allan Kamau <kamauallan@xxxxxxxxx> · Sat, 24 Jul 2010 09:56:26 +0300

Hi all,

I have a large table that contains redundancies as per one field.

I am looking for a way to identify (or extract) a non redundant set of
rows ( _any_ one record per group) from this table and for each record
of this "distinct" set of rows, I would like to capture it's other
fields.
Below is a simplified example. In this example I would like to base
groups on the value of field "reading".

CREATE TABLE foo
(id INTEGER
,reading INTEGER
,entry_date TIMESTAMP
,source TEXT
,primary key(id)
);

INSERT INTO foo(1,55,'2010-04-01 06:31:13','A');
INSERT INTO foo(2,55,'2010-04-01 06:31:20','X');
INSERT INTO foo(3,45,'2010-04-01 06:38:02','P');
INSERT INTO foo(6,55,'2010-04-01 06:21:44','B');
INSERT INTO foo(4,34,'2010-04-01 06:51:24','K');
INSERT INTO foo(8,61,'2010-04-01 06:22:03','A');
INSERT INTO foo(9,34,'2010-04-01 06:48:07','C');

Desired output (any record selected based on "reading" field).

id,reading,entry_date,source
1,55,'2010-04-01 06:31:13','A'
3,45,'2010-04-01 06:38:02','P'
9,34,'2010-04-01 06:48:07','C'
8,61,'2010-04-01 06:22:03','A'

I am worried that using min() and group by to generate a relation
which is then used in a join may be slow.

SELECT
a.*
FROM
foo a
JOIN
(
SELECT
min(a.id)AS id_min
FROM
foo a
GROUP BY
a.reading
)b
ON
a.id=b.id_min
;

How is the performance of rank() (window function) in general?

Allan.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general