On Mar 17, 2015, at 9:05 AM, David G. Johnston <david.g.johnston@xxxxxxxxx> wrote:
On Tuesday, March 17, 2015, Israel Brewster < israel@xxxxxxxxxxxxxx> wrote:
> On Mar 17, 2015, at 8:09 AM, Paul Jungwirth <pj@xxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>>> test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) group by h order by h;
>>>
>>> h | count
>>> ----+-------
>>> 8 | 2
>>> 9 | 3
>>> 10 | 2
>>> 11 | 2
>
> Note if you always want all 24 rows with a count of 0 when appropriate (which seems common in reports with tables or plots), you can just tweak the above query to use a left join: FROM generate_series(0, 23) AS s(h) LEFT OUTER JOIN start_end ON h BETWEEN ...
>
> Paul
Right, thanks. That makes sense. So next question: how do I get the "active" time per hour from this? To use the same example that came up with this result set:
Which is why you do not (only?) want to convert your data to hour-of-day but want to create timestamp end points. Then you simply do timestamp subtraction to get durations which you can then sum together.
Well, it's not QUITE that simple. For example, row id 3 which starts at 08:00:00 and ends at 11:45:00 in the example. If I have a timestamp endpoint of 10:00:00 for the 9 hour, and I just do simple timestamp subtraction, I'll get an interval of 2 (10:00:00 - 08:00:00), which is not correct since there can't be more than an hour in any given hour. Similarly for the 11 hour and either of the two matching rows - since they end during the hour in question (row 2 only contributes 5 minutes), I'd actually need to subtract the end_time from the start point in that case to get the time.
That said, the concept is sound, and I am fairly sure I can make it work using a case when statement to handle the various permutations of starting before and or ending after the hour in question. I'll work on that, but if there is a more elegant solution, I'm all ears :-)
|