Peter Hunsberger <peter.hunsberger@xxxxxxxxx> wrote: > [...] > I have one solution that joins the table against itself and does > (among other things) a subselect looking "not exists col +1" and "not > exists col -1" on the two instances of the table to find the start and > end. This is, as you might guess, is not very efficient (my actual > data is some 6 million+ rows) and I'm guessing there has to be > something more efficient with windowing or possibly grouping on min > and max (though I can't see how to make sure they are part of a > contiguous set). Anyone have any ideas? You can either use a PL/pgSQL function ("SETOF TEXT" just for the convenience of the example): | CREATE FUNCTION SummarizeRanges () RETURNS SETOF TEXT AS $$ | DECLARE | CurrentFirst INT; | CurrentLast INT; | CurrentRecord RECORD; | BEGIN | FOR CurrentRecord IN SELECT col FROM t ORDER BY col LOOP | IF CurrentFirst IS NULL THEN | CurrentFirst := CurrentRecord.col; | CurrentLast := CurrentRecord.col; | ELSIF CurrentRecord.col = CurrentLast + 1 THEN | CurrentLast := CurrentRecord.col; | ELSE | RETURN NEXT CurrentFirst || ', ' || CurrentLast; | CurrentFirst := CurrentRecord.col; | CurrentLast := CurrentRecord.col; | END IF; | END LOOP; | IF CurrentFirst IS NOT NULL THEN | RETURN NEXT CurrentFirst || ', ' || CurrentLast; | END IF; | RETURN; | END; | $$ LANGUAGE plpgsql; or a recursive query (which I always find very hard to com- prehend): | WITH RECURSIVE RecCols (LeftBoundary, Value) AS | (SELECT col, col FROM t WHERE (col - 1) NOT IN (SELECT col FROM t) | UNION ALL SELECT p.LeftBoundary, c.col FROM RecCols AS p, t AS c WHERE c.col = p.Value + 1) | SELECT LeftBoundary, MAX(Value) AS RightBoundary FROM RecCols | GROUP BY LeftBoundary | ORDER BY LeftBoundary; Could you run both against your data set and find out which one is faster for your six million rows? Tim -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general