Hi Mike, I happened upon your query, which is related to some stuff I've been playing with. Firstly, David's solution below doesn't work. I haven't yet tried to work out why. Secondly, I was hoping to be able to solve your problem nicely with Postgres 8.4's window functions [1,2], which provide functions relating. Given the following setup: CREATE TABLE foo (dt date, bin varchar(4)); INSERT INTO foo VALUES ('2009-01-01', 'red'), ('2009-01-02', 'red'), ('2009-01-03', 'blue'), ('2009-01-04', 'blue'), ('2009-01-05', 'blue'), ('2009-01-06', 'red'), ('2009-01-07', 'blue'), ('2009-01-08', 'blue'), ('2009-01-09', 'red'), ('2009-01-10', 'red'); I had hoped the following would suffice: SELECT first_value(dt) OVER w, last_value(dt) OVER w, bin FROM foo WINDOW w AS (ORDER BY dt PARTITION BY bin); Apparently, this is bad syntax. ORDER BY cannot precede PARTITION BY in a WINDOW declaration, and yet I wanted a grouping of date-consecutive bins, which (PARTITION BY bin ORDER BY dt) would not give me. I was able to produce the required result with: SELECT MIN(dt) AS first, MAX(dt) AS last, MAX(bin) AS bin FROM (SELECT dt, bin, CASE WHEN newbin IS NULL THEN 0 ELSE SUM(newbin) OVER (ORDER BY dt) END AS binno FROM (SELECT *, (bin != lag(bin, 1) OVER (ORDER BY dt))::int AS newbin FROM foo ) AS newbins ) AS binnos GROUP BY binno ORDER BY first; This relies on a middle step in which I create an enumeration of the bins in sequence: SELECT dt, bin, CASE WHEN newbin IS NULL THEN 0 ELSE sum(newbin) OVER (ORDER BY dt) END AS binno FROM (SELECT *, (bin != lag(bin, 1) OVER (ORDER BY dt))::int AS newbin FROM foo) AS newbins; dt | bin | binno ------------+------+------- 2009-01-01 | red | 0 2009-01-02 | red | 0 2009-01-03 | blue | 1 2009-01-04 | blue | 1 2009-01-05 | blue | 1 2009-01-06 | red | 2 2009-01-07 | blue | 3 2009-01-08 | blue | 3 2009-01-09 | red | 4 2009-01-10 | red | 4 I would hope there is a neater way to do this with window functions. The best way to solve your problem may be with PL/SQL, which is also good at dealing with sequences (it's not as complicated as it looks!): CREATE TYPE bindates AS (first date, last date, bin varchar(5)); CREATE OR REPLACE FUNCTION bindates() RETURNS SETOF bindates AS $$ DECLARE row record; res bindates; BEGIN FOR row IN SELECT * FROM foo ORDER BY dt LOOP IF res.bin IS NULL OR res.bin != row.bin THEN IF res.bin IS NOT NULL THEN RETURN NEXT res; END IF; res.bin := row.bin; res.first := row.dt; END IF; res.last := row.dt; END LOOP; IF res.bin IS NOT NULL THEN RETURN NEXT res; END IF; END; $$ LANGUAGE plpgsql; Finally, I'll try to sort out David's solution. Once we correct his typo (t1.order -> t1.date) and add an 'ORDER BY first' to the end, we get: first | last | bin ------------+------------+------ 2009-01-03 | 2009-01-05 | blue 2009-01-06 | 2009-01-06 | red 2009-01-07 | 2009-01-08 | blue 2009-01-09 | | red This includes correct output, but it fails on both edge cases. The non-appearance of the first row is due to the WHERE clause on the main SELECT statement: WHERE (SELECT bin FROM foo t2 WHERE t2.dt < t1.dt ORDER BY dt DESC LIMIT 1) <> t1.bin If we drop this WHERE clause, we get: first | last | bin ------------+------------+------ 2009-01-01 | 2009-01-02 | red 2009-01-02 | 2009-01-02 | red 2009-01-03 | 2009-01-05 | blue 2009-01-04 | 2009-01-05 | blue 2009-01-05 | 2009-01-05 | blue 2009-01-06 | 2009-01-06 | red 2009-01-07 | 2009-01-08 | blue 2009-01-08 | 2009-01-08 | blue 2009-01-09 | | red 2009-01-10 | | red We can therefore get the result including the first row by selecting from this table with 'GROUP BY last, bin'. And we can hack in a value for those final NULLs as a special case. The following statement works: SELECT MIN(first), CASE WHEN last IS NULL THEN (SELECT MAX(dt) FROM foo) ELSE last END, bin FROM ( SELECT dt AS first, (SELECT dt FROM foo t3 WHERE t3.dt < ( SELECT dt FROM foo t5 WHERE t5.dt > t1.dt AND t5.bin <> t1.bin ORDER BY dt ASC LIMIT 1) ORDER BY dt DESC LIMIT 1) AS last, bin FROM foo t1 ) t0 GROUP BY last, bin ORDER BY last; Finally, what's efficient? With 1,000,000 random rows, we get: Enumeration: 13s PL/SQL: 12s Modified David: minutes. [I used the following to create test data: CREATE OR REPLACE FUNCTION make_random(n int) RETURNS SETOF foo AS $$ import random for i in xrange(n): m = random.randint(1,12) d = random.randint(1,28) b = random.choice(('red', 'blue')) yield '2009-%d-%d' % (m, d), b $$ LANGUAGE plpythonu; DELETE FROM foo; INSERT INTO foo (SELECT * FROM make_random(1000000));] I hope that helps you in considering various approaches to the problem. - Joel [1] http://developer.postgresql.org/pgdocs/postgres/tutorial-window.html [2] http://developer.postgresql.org/pgdocs/postgres/functions-window.html On Tue, 16 Jun 2009 06:06:16 +1000, David Wilson <david.t.wilson@xxxxxxxxx> wrote: > On Mon, Jun 15, 2009 at 2:23 PM, Mike Toews<mwtoews@xxxxxx> wrote: >> Hi, >> >> I'm having difficulty constructing a query that will find breaks where >> data >> change in a time-series. I've done some searching for this too, but I >> haven't found anything. >> >> Here is my example situation, consider my source table: >> date bin >> 2009-01-01 red >> 2009-01-02 red >> 2009-01-03 blue >> 2009-01-04 blue >> 2009-01-05 blue >> 2009-01-06 red >> 2009-01-07 blue >> 2009-01-08 blue >> 2009-01-09 red >> 2009-01-10 red >> >> >> I would like to get the first and last of each consecutive series based >> on >> column "bin". My result for the table would look like: >> first last bin >> 2009-01-01 2009-01-02 red >> 2009-01-03 2009-01-05 blue >> 2009-01-06 2009-01-06 red >> 2009-01-07 2009-01-08 blue >> 2009-01-09 2009-01-10 red >> >> >> This is easy to compute using a spreadsheet or in R, but how would I do >> this >> with SQL? I'm using 8.3. Advice is appreciated. > > (Written in email and untested- also, someone will probably provide a > better way, I hope, but this should at least work) > > select date as first, > (select date from table t3 where t3.date<(select date from table t5 > where t5.date>t1.date and t5.bin<>t1.bin order by date asc limit 1) > order by date desc limit 1) as last, > bin > from table t1 where (select bin from table t2 where t2.date<t1.order > order by date desc limit 1)<>t1.bin; > > Ugly, and I'm pretty sure there's a much better way, but my brain is > failing me right now- hopefully this'll at least get you started, > though. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general