On Monday, February 4, 2013, Kirk Wythers wrote:
Hi Brent,
Nice to hear from you. I hope your world is good.
On Feb 4, 2013, at 2:14 PM, Brent Wood <Brent.Wood@xxxxxxxxxx> wrote:
> Hi Kirk,
>
> We have a (near) real time data database for instrument observations from our research vessels. All observations (summarised to one minute intervals - the actual raw data is in netCDF, this database makes for easier access & meets most users needs) go into a single table, with other tables for metadata about platforms, instruments, etc. Now approaching 350m records, so reasonably substantial.
>
> Underlying optimisations include
>
> partitioned readings table, with a separate partition for each year (now 23 years)
> clustered index on timestamp for the previous years partitions.
> largeish filesystem block size - tested to work well with the clustered index & small size records)
>
> These generally make a big difference to performance. To address one issue, much like yours, where some users want hourly data for a year, some want daily data for 10 years & some want 1 minute data for the last month (& some, no doubt, want one minute data for 20+ years!) I introduced an integer column called timer. This value is set according to the time (not date) of each record.
Very similar to what I need to do. Our main table consists of records that have been standardized to 15 minute timestamps. Here is a simplified example
record timestamp variable value
1 12:00:00 temp 12.6
2 12:15:00 temp 12.3
3 12:30:00 temp 11.7
4 12:45:00 temp 12.3
5 13:00:00 temp 13.9
6 13:15:00 temp 12.5
7 13.30:00 temp 13.7
8 13:45:00 temp 12.0
You are exactly right, some people will want the original 15 minute version, some people will want these summarized to hourly data, and others will want these summarized to daily data. Still others may be satisfied with monthly summaries.
>
> Along the lines of (from memory) :an even no of minutes after the hour is 2, 5 minutes is 4, 10 minutes is 8, 15 minute is 16, 30 minutes is 32, 60 minutes is 64, 6 hourly is 128, 12:00 AM is 256 & 12:00PM is 512. When any timestamp is in more than one category (eg: 12:00 is all of even, 5, 15m 30m 60 minutes), the timer value is set to the largest appropriate one.
I'm not quite following. In my case, if I want hourly data, I'd be looking for…
record timestamp variable value
1 12:00:00 temp 12.225
2 13:00:00 temp 13.025
Are you saying that I could use an approach that WHILE statement? Something like:
WHILE data_truc('hour', timestamp) = 12:00:00, then calulate AVG(value)?
>
> So a request for:
> 1 minute data is select from table;
> 2 minute data is select from table where timer >=2 and timer !=15 and timer !=4;
> hourly data is select from table where timer >=64 and timer != 15 and timer != 4;
> etc
>
> 5 & 15 minute add a bit of complexity, but we gave the users what they wanted. This has worked well for years now, & we have an internal web (mapserver/openlayers based) application allowing users to visualise & download their selected data - they choose from an interval pick list & the SQL is hidden. Some extra enhancements are the automatic collation of lat & lon gps readings into a Postgis point for each reading record, & the automatic aggregation of daily points into daily track lines, so the track for any selected set of dates can easily be displayed on a map (the platforms are mobile vessels - not fixed sites)
>
> You might adapt some of these ideas for your use case?
>
> Cheers
>
> Brent Wood
>
> Programme leader: Environmental Information Delivery
> NIWA
> DDI: +64 (4) 3860529
> ________________________________________
> From: pgsql-general-owner@xxxxxxxxxxxxxx [pgsql-general-owner@xxxxxxxxxxxxxx] on behalf of Kirk Wythers [wythe001@xxxxxxx]
> Sent: Tuesday, February 05, 2013 5:58 AM
> To: pgsql-general@xxxxxxxxxxxxxx
> Subject: Fwd: partial time stamp query
>
> Thanks. That worked great! Now I am trying to aggregate these same fifteen minute to hourly. I have tried using date_trunk:
>
> date_trunc('hour', derived_tsoil_fifteen_min_stacked.time2),
>
> but date_truck only seems to aggriage the timestamp. I thought I could use
>
> AVG(derived_tsoil_fifteen_min_stacked.value)
>
> in combination with date_trunk, but I still get 15 minute values, not the hourly average from the four 15 minute records.
>
> rowid date_truck time2 site canopy plot variable name value avg
> 2010-07-07_00:00:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 00:00:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 21.06 21.0599994659424
> 2010-07-07_00:15:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 00:15:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.96 20.9599990844727
> 2010-07-07_00:30:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 00:30:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.88 20.8799991607666
> 2010-07-07_00:45:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 00:45:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.8 20.7999992370605
> 2010-07-07_01:00:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 01:00:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.72 20.7199993133545
> 2010-07-07_01:15:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 01:15:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.64 20.6399993896484
> 2010-07-07_01:30:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 01:30:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.55 20.5499992370605
> 2010-07-07_01:45:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 01:45:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.47 20.4699993133545
>
> I was tying to get two records out of this set, with the 'avg" column representing the mean of the first and last four of each 15 minute records.
>
> Perhaps date_trunk only works for the timestamp?
>
>
>
> On Feb 4, 2013, at 8:50 AM, Misa Simic <misa.simic@xxxxxxxxx<mailto:misa.simic@xxxxxxxxx>> wrote:
>
> WHERE derived_tsoil_fifteen_min_stacked.time2::date = '2010-07-07'::date
>
> On Monday, February 4, 2013, Kirk Wythers wrote:
> I am trying to write a query that grabs one particular day from a timestamp column. The data are ordered in 15 minute chunks like this:
>
> 2010-07-07 12:45:00
> 2010-07-07 13:00:00
> 2010-07-07 13:15:00
> 2010-07-07 13:30:00
> etc…
>
> If I wanted all records from july 7th 2010, I would expect 4 x 24 = 96 records per day.
>
> I have tried the '=' operator, like this
>
> WHERE derived_tsoil_fifteen_min_stacked.time2 = '2010-07-07*'
>
> but that grabs nothing, and using the '~' operator grabs everything with a 2010 or 07 in it… in other word