BEGIN:VCARD VERSION:3.0 N:Brewster;Israel;;; FN:Israel Brewster ORG:Frontier Flying Service;MIS TITLE:PC Support Tech II EMAIL;type=INTERNET;type=WORK;type=pref:israel@xxxxxxxxxxxxxxxxxx TEL;type=WORK;type=pref:907-450-7293 item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701; item1.X-ABADR:us CATEGORIES:General X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson END:VCARD
> On Mar 16, 2015, at 3:46 PM, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote: > > On 03/16/2015 04:16 PM, Israel Brewster wrote: >> On Mar 16, 2015, at 2:22 PM, David G. Johnston >> <david.g.johnston@xxxxxxxxx <mailto:david.g.johnston@xxxxxxxxx>> wrote: >>> >>> On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver >>> <adrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@xxxxxxxxxxx>>wrote: >>> >>> On 03/16/2015 02:57 PM, Israel Brewster wrote: >>> >>> I have a table with two timestamp columns for the start time >>> and end >>> time of each record (call them start and end).I'm trying to >>> figure out >>> if there is a way to group these records by "hour of day", >>> that is the >>> record should be included in the group if the hour of the day >>> for the >>> group falls anywhere in the range [start,end]. Obviously each >>> record may >>> well fall into multiple groups under this scenario. >>> >>> The goal here is to figure out, for each hour of the day, a) >>> what is the >>> total number of "active" records for that hour, and b) what is >>> the total >>> "active" time for those records during the hour, with an >>> ultimate goal >>> of figuring out the average active time per record per hour. >>> >>> So, for simplified example, if the table contained three records: >>> >>> start | end >>> ------------------------------__----------------------- >>> 2015-03-15 08:15 | 2015-03-15 10:45 >>> 2015-03-15 09:30 | 2015-03-15 10:15 >>> 2015-03-15 10:30 | 2015-03-15 11:30 >>> >>> >>> Then the results should break out something like this: >>> >>> hour | count | sum >>> ----------------------------- >>> 8 | 1 | 0.75 >>> 9 | 2 | 1.5 >>> 10 | 3 | 1.5 >>> 11 | 1 | 0.5 >>> >>> I can then easily manipulate these values to get my ultimate >>> goal of the >>> average, which would of course always be less than or equal to >>> 1. Is >>> this doable in postgress? Or would it be a better idea to >>> simply pull >>> the raw data and post-process in code? Thanks. >>> >>> >>> Do not have an answer for you, but a question: >>> >>> What version of Postgres are you on? >>> >>> This will help determine what tools are available to work with. >>> >>> >>> The following will give you endpoints for your bounds. Version is >>> important since "range types" could be very useful in this situation - >>> but you'd still need to generate the bounds info regardless. >>> >>> >>> SELECT * >>> FROM >>> (SELECT * FROM generate_series('2015-03-15'::timestamptz, >>> '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s >>> CROSS JOIN >>> (SELECT end_ts + '1 hour'::interval AS end_ts FROM >>> generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, >>> '1 hour'::interval) e (end_ts)) AS e >>> >>> You would join this using an ON condition with an OR (start BETWEEN >>> [...] OR end BETWEEN [...]) - range logic will be better and you may >>> want to adjust the upper bound by negative 1 (nano-second?) to allow >>> for easier "<=" logic if using BETWEEN. >>> >> >> Thanks, that is very helpful, but are you sure CROSS JOIN is what you >> wanted here? using that, I get a 625 row result set where each row from >> the first SELECT is paired up with EVERY row from the second select. I >> would think I would want the first row of the first SELECT paired up >> with only the first row of the second, second row of the first paired >> with the second row of the second, etc - i.e. 24 start and end bounds. >> Or am I missing something? > > Given this: > > test=> select * from start_end ; > id | start_time | end_time > ----+------------------------+------------------------ > 1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 > 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 > 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 > > using Pauls hints I got: > > test=> select * from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) ; > > id | start_time | end_time | h > ----+------------------------+------------------------+---- > 1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 8 > 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 8 > 1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 9 > 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 9 > 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 9 > 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10 > 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10 > 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11 > 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11 > > > test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) group by h order by h; > > h | count > ----+------- > 8 | 2 > 9 | 3 > 10 | 2 > 11 | 2 > Awesome! That looks perfect. I still need to incorporate it into the bigger picture, of course, but I'm sure I can handle that. Thanks! ----------------------------------------------- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 ----------------------------------------------- >> >> >> >> ----------------------------------------------- >> Israel Brewster >> Systems Analyst II >> Ravn Alaska >> 5245 Airport Industrial Rd >> Fairbanks, AK 99709 >> (907) 450-7293 >> ----------------------------------------------- >> >>> David J. >>> >> > > > -- > Adrian Klaver > adrian.klaver@xxxxxxxxxxx
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general