Re: Group by range in hour of day

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Mon, 16 Mar 2015 16:46:33 -0700

On 03/16/2015 04:16 PM, Israel Brewster wrote:
On Mar 16, 2015, at 2:22 PM, David G. Johnston
<david.g.johnston@xxxxxxxxx <mailto:david.g.johnston@xxxxxxxxx>> wrote:

On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver
<adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>>wrote:

    On 03/16/2015 02:57 PM, Israel Brewster wrote:

        I have a table with two timestamp columns for the start time
        and end
        time of each record (call them start and end).I'm trying to
        figure out
        if there is a way to group these records by "hour of day",
        that is the
        record should be included in the group if the hour of the day
        for the
        group falls anywhere in the range [start,end]. Obviously each
        record may
        well fall into multiple groups under this scenario.

        The goal here is to figure out, for each hour of the day, a)
        what is the
        total number of "active" records for that hour, and b) what is
        the total
        "active" time for those records during the hour, with an
        ultimate goal
        of figuring out the average active time per record per hour.

        So, for simplified example, if the table contained three records:

                  start              |               end
        ------------------------------__-----------------------
        2015-03-15 08:15  |  2015-03-15 10:45
        2015-03-15 09:30  |  2015-03-15 10:15
        2015-03-15 10:30  |  2015-03-15 11:30

        Then the results should break out something like this:

        hour  |  count  |  sum
        -----------------------------
        8       |    1       |   0.75
        9       |    2       |   1.5
        10     |    3       |   1.5
        11     |    1       |   0.5

        I can then easily manipulate these values to get my ultimate
        goal of the
        average, which would of course always be less than or equal to
        1. Is
        this doable in postgress? Or would it be a better idea to
        simply pull
        the raw data and post-process in code? Thanks.

    Do not have an answer for you, but a question:

    What version of Postgres are you on?

    This will help determine what tools are available to work with.

The following will give you endpoints for your bounds.  Version is
important since "range types" could be very useful in this situation -
but you'd still need to generate the bounds info regardless.

SELECT *
FROM
(SELECT * FROM generate_series('2015-03-15'::timestamptz,
'2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
CROSS JOIN
(SELECT end_ts + '1 hour'::interval AS end_ts FROM
generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz,
'1 hour'::interval) e (end_ts)) AS e

You would join this using an ON condition with an OR (start BETWEEN
[...] OR end BETWEEN [...]) - range logic will be better and you may
want to adjust the upper bound by negative 1 (nano-second?) to allow
for easier "<=" logic if using BETWEEN.

Thanks, that is very helpful, but are you sure CROSS JOIN is what you
wanted here? using that, I get a 625 row result set where each row from
the first SELECT is paired up with EVERY row from the second select. I
would think I would want the first row of the first SELECT paired up
with only the first row of the second, second row of the first paired
with the second row of the second, etc - i.e. 24 start and end bounds.
Or am I missing something?

Given this:

test=> select * from start_end ;
 id |       start_time       |        end_time
----+------------------------+------------------------
  1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07
  2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07
  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07

using Pauls hints I got:

test=> select *  from start_end, generate_series(0, 23) as s(h) where h 
between extract(hour from start_time) and extract(hour from end_time) ;

 id |       start_time       |        end_time        | h
----+------------------------+------------------------+----
  1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 |  8
  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 |  8
  1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 |  9
  2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 |  9
  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 |  9
  2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10
  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10
  2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11
  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11

test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) 
where h between extract(hour from start_time) and extract(hour from 
end_time) group by h order by h;

 h  | count
----+-------
  8 |     2
  9 |     3
 10 |     2
 11 |     2

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

David J.

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general