Statistics query

Steve Crawford <scrawford@xxxxxxxxxxxxxxxxxxxx> · Wed, 10 Apr 2013 09:30:55 -0700

I'm seeking ideas on the best way to craft the following query. I've 
stripped everything down to the bare essentials and simplified it below.

Input data has a timestamp (actually an int received from the system in 
the form of a Unix epoch), a unit identifier and a status:

 event_time | unit_id | status
------------+---------+--------
 1357056011 |      60 |      1
 1357056012 |     178 |      0
 1357056019 |     168 |      0
 1357056021 |       3 |      0
 1357056021 |       4 |      1
 1357056021 |     179 |      0
 1357056022 |       0 |      1
 1357056022 |       1 |      0
 1357056023 |       2 |      0
 1357056024 |       9 |      0
 1357056025 |       5 |      0
 1357056025 |       6 |      0
 1357056026 |       7 |      1
...

A given unit_id cannot have two events at the same time (enforced by 
constraints).

Given a point in time I would like to:

1. Identify all distinct unit_ids with an entry that exists in the 
preceding hour then

2. Count both the total events and sum the status=1 events for the most 
recent 50 events for each unit_id that fall within a limited period 
(e.g. don't look at data earlier than midnight). So unit_id 60 might 
have 50 events in the last 15 minutes while unit_id 4 might have only 12 
events after midnight.

The output would look something like:

 unit_id | events | status_1_count
---------+--------+----------------
      1  |     50 |             34
      2  |     27 |             18
      1  |     50 |             34
      1  |      2 |              0
...

Each sub-portion is easy and while I could use external processing or 
set-returning functions I was hoping first to find the secret-sauce to 
glue everything together into a single query.

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general