Re: Group by range in hour of day

Israel Brewster <israel@xxxxxxxxxxxxxx> · Tue, 17 Mar 2015 07:52:42 -0800



BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:israel@xxxxxxxxxxxxxxxxxx
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD
 

> On Mar 16, 2015, at 3:46 PM, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
> 
> On 03/16/2015 04:16 PM, Israel Brewster wrote:
>> On Mar 16, 2015, at 2:22 PM, David G. Johnston
>> <david.g.johnston@xxxxxxxxx <mailto:david.g.johnston@xxxxxxxxx>> wrote:
>>> 
>>> On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver
>>> <adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>>wrote:
>>> 
>>>    On 03/16/2015 02:57 PM, Israel Brewster wrote:
>>> 
>>>        I have a table with two timestamp columns for the start time
>>>        and end
>>>        time of each record (call them start and end).I'm trying to
>>>        figure out
>>>        if there is a way to group these records by "hour of day",
>>>        that is the
>>>        record should be included in the group if the hour of the day
>>>        for the
>>>        group falls anywhere in the range [start,end]. Obviously each
>>>        record may
>>>        well fall into multiple groups under this scenario.
>>> 
>>>        The goal here is to figure out, for each hour of the day, a)
>>>        what is the
>>>        total number of "active" records for that hour, and b) what is
>>>        the total
>>>        "active" time for those records during the hour, with an
>>>        ultimate goal
>>>        of figuring out the average active time per record per hour.
>>> 
>>>        So, for simplified example, if the table contained three records:
>>> 
>>>                  start              |               end
>>>        ------------------------------__-----------------------
>>>        2015-03-15 08:15  |  2015-03-15 10:45
>>>        2015-03-15 09:30  |  2015-03-15 10:15
>>>        2015-03-15 10:30  |  2015-03-15 11:30
>>> 
>>> 
>>>        Then the results should break out something like this:
>>> 
>>>        hour  |  count  |  sum
>>>        -----------------------------
>>>        8       |    1       |   0.75
>>>        9       |    2       |   1.5
>>>        10     |    3       |   1.5
>>>        11     |    1       |   0.5
>>> 
>>>        I can then easily manipulate these values to get my ultimate
>>>        goal of the
>>>        average, which would of course always be less than or equal to
>>>        1. Is
>>>        this doable in postgress? Or would it be a better idea to
>>>        simply pull
>>>        the raw data and post-process in code? Thanks.
>>> 
>>> 
>>>    Do not have an answer for you, but a question:
>>> 
>>>    What version of Postgres are you on?
>>> 
>>>    This will help determine what tools are available to work with.
>>> 
>>> 
>>> The following will give you endpoints for your bounds.  Version is
>>> important since "range types" could be very useful in this situation -
>>> but you'd still need to generate the bounds info regardless.
>>> 
>>> 
>>> SELECT *
>>> FROM
>>> (SELECT * FROM generate_series('2015-03-15'::timestamptz,
>>> '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
>>> CROSS JOIN
>>> (SELECT end_ts + '1 hour'::interval AS end_ts FROM
>>> generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz,
>>> '1 hour'::interval) e (end_ts)) AS e
>>> 
>>> You would join this using an ON condition with an OR (start BETWEEN
>>> [...] OR end BETWEEN [...]) - range logic will be better and you may
>>> want to adjust the upper bound by negative 1 (nano-second?) to allow
>>> for easier "<=" logic if using BETWEEN.
>>> 
>> 
>> Thanks, that is very helpful, but are you sure CROSS JOIN is what you
>> wanted here? using that, I get a 625 row result set where each row from
>> the first SELECT is paired up with EVERY row from the second select. I
>> would think I would want the first row of the first SELECT paired up
>> with only the first row of the second, second row of the first paired
>> with the second row of the second, etc - i.e. 24 start and end bounds.
>> Or am I missing something?
> 
> Given this:
> 
> test=> select * from start_end ;
> id |       start_time       |        end_time
> ----+------------------------+------------------------
>  1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07
>  2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07
>  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07
> 
> using Pauls hints I got:
> 
> test=> select *  from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) ;
> 
> id |       start_time       |        end_time        | h
> ----+------------------------+------------------------+----
>  1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 |  8
>  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 |  8
>  1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 |  9
>  2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 |  9
>  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 |  9
>  2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10
>  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10
>  2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11
>  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11
> 
> 
> test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) group by h order by h;
> 
> h  | count
> ----+-------
>  8 |     2
>  9 |     3
> 10 |     2
> 11 |     2
> 

Awesome! That looks perfect. I still need to incorporate it into the bigger picture, of course, but I'm sure I can handle that. Thanks!
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

>> 
>> 
>> 
>> -----------------------------------------------
>> Israel Brewster
>> Systems Analyst II
>> Ravn Alaska
>> 5245 Airport Industrial Rd
>> Fairbanks, AK 99709
>> (907) 450-7293
>> -----------------------------------------------
>> 
>>> David J.
>>> 
>> 
> 
> 
> -- 
> Adrian Klaver
> adrian.klaver@xxxxxxxxxxx

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general