On Fri, Feb 23, 2018 at 6:42 AM, Viktor Fougstedt <viktor@xxxxxxxxxxx> wrote:
BEGIN;
CREATE TEMP TABLE sample_data (
client_id INTEGER,
start_date DATE,
end_date DATE,
episode INTEGER -- I won't actually have this info; just a label I added for clarity and testing
);
INSERT INTO sample_data VALUES
Hi.
This answer is perhaps useful if I understand your problem correctly. But I might have interpreted it wrongly. :-)
I would probably start with merging intervals so that overlapping and adjacent intervals become single continuous intervals, then select from those merged intervals.
We have an application with a lot of interval handling in PostgreSQL, and we use many variants of algorithms based on one by Itzik Ben Gan which he calls “packing intervals”. The post we started with was an old reader’s challenge from SQL Magazine. It has since been updated when MS-SQL started supporting window functions better (Itzik Ben Gan is a MS-SQL-guy).
Basically, it is a few CTE:s which convert the intervals into “start” (+1) and “stop” (-1) events, then keeps a running sum of these, and finally creates new intervals by noting that the merged (or “packed”) intervals starts with events that had sum=0 before them and stops with events which have sum=0 after them.
It involves both CTE:s and window functions and is quite a beautiful example of SQL, IMHO.
I think it’s best to google it, he does a good job of explaining how it works.
Hope that helps a bit at least.
/Viktor
Hi. Thanks for the many helpful responses! Although I didn't end up with exactly what I was looking for, I think I got to something that works good enough for now. I did it with CTEs, in a way I think similar to what you were suggesting (my printer is out of ink, so I didn't actually get to look at that book yet!) I ended up having to do 4 passes:
1) Identify the starts and ends of continuous ranges
2) Eliminate the middle-point records (non stops/starts)
3) Merge the stop date in with the starts
4) Eliminate the stops
I couldn't see how to do it in less steps, but if there's a way...
I posted the query along with some sample data below.
Thanks again!
Ken
CREATE TEMP TABLE sample_data (
client_id INTEGER,
start_date DATE,
end_date DATE,
episode INTEGER -- I won't actually have this info; just a label I added for clarity and testing
);
INSERT INTO sample_data VALUES
(1,'1990-01-01','1990-12-31',0),
(1,'1991-01-01','1991-12-31',0),
(1,'1995-01-01','1995-06-30',1),
(1,'2000-01-01','2000-12-31',2),
(1,'2001-01-01','2001-12-31',2),
(1,'2002-01-01','2002-12-31',2),
(1,'2003-01-01','2003-12-31',2),
(1,'2004-01-01','2004-12-31',2),
(1,'2005-01-01','2005-12-31',2),
(1,'2006-01-01','2006-12-31',2),
(1,'2014-01-01','2014-12-31',3),
(1,'2015-01-01','2015-12-31',3),
(1,'2017-06-30','2017-12-31',4),
(1,'2018-01-01',NULL,4),
(2,'2014-02-01','2015-01-31',0),
(2,'2015-02-01','2015-12-31',0),
(2,'2017-09-30','2018-01-31',1),
(2,'2018-02-01','2018-02-14',1)
;
WITH peek3 AS (
WITH peek2 AS (
WITH peek AS (
SELECT
client_id,
episode,
daterange(start_date,end_date,'[]') AS range,
COALESCE(daterange(start_date,end_date+1,'[]') &&
lead(daterange(start_date,end_date,'[]')) OVER (PARTITION BY client_id ORDER BY start_date),false) AS continues,
COALESCE(daterange(start_date,end_date,'[]') &&
lag(daterange(start_date,end_date+1,'[]')) OVER (PARTITION BY client_id ORDER BY start_date),false) AS is_continued
FROM
sample_data
)
SELECT
*
FROM peek
WHERE NOT (is_continued AND continues)
)
SELECT client_id,episode,range,
daterange(lower(range),
CASE WHEN lead(is_continued) OVER (PARTITION BY client_id ORDER BY range) THEN
lead(upper(range)) OVER (PARTITION BY client_id ORDER BY range)
ELSE upper(range)
END) AS full_range
,continues,is_continued
FROM peek2
)
SELECT * FROM peek3
WHERE NOT is_continued
;
AGENCY Software
A Free Software data system
By and for non-profits
(253) 245-3801
learn more about AGENCY or
follow the discussion.