Re: Storing sensor data

Greg Jaman <gjaman@xxxxxxxxx> · Thu, 28 May 2009 12:38:41 -0700

I also forgot to note that I had no problems setting up replication via londiste (skytools).   The cronjob that creates the partition each week for me also adds the table to the replication set.   As simple as:
   londiste.py londiste.ini provider add 'public.Data_YYYY_WI'

   londiste.py londiste.ini subscriber add 'public.Data_YYYY_WI'

On Thu, May 28, 2009 at 11:56 AM, Greg Jaman <gjaman@xxxxxxxxx> wrote:

I currently have a database doing something very similar.  I setup partition tables with predictable names based on the the data's timestamp week number eg:  (Data_YYYY_WI).  

I have a tigger on the parent partition table to redirect data to the correct partition( tablename:='Data_' || to_char('$NEW(ts)'::timestamptz, 'IYYY_IW') ) .  then I use dynamic sql to do the insert.  I did some optimization by writting it in pl/TCL and using global variables to store prepared insert statements.

Most queries for me are based  on the date and we have decent performance with our current setup.  For last/current sensor data we just store the last dataID in the sensor record.  I haven't thought of a better way yet.  After batch inserts we caculate the last reading for each participating sensorID inserted.

With partition tables we struggled with the query to get the lastest data :  select * from "Data" where "sensorID"=x order by ts limit 1  -- for parition tables.   See (http://archives.postgresql.org/pgsql-performance/2008-11/msg00284.php)   

On Thu, May 28, 2009 at 7:55 AM, Ivan Voras <ivoras@xxxxxxxxxxx> wrote:

2009/5/28 Heikki Linnakangas <heikki.linnakangas@xxxxxxxxxxxxxxxx>:

> Ivan Voras wrote:

>>

>> I need to store data about sensor readings. There is a known (but

>> configurable) number of sensors which can send update data at any time.

>> The "current" state needs to be kept but also all historical records.

>> I'm trying to decide between these two designs:

>>

>> 1) create a table for "current" data, one record for each sensor, update

>> this table when a sensor reading arrives, create a trigger that would

>> transfer old record data to a history table (of basically the same

>> structure)

>> 2) write only to the history table, use relatively complex queries or

>> outside-the-database magic to determine what the "current" values of the

>> sensors are.

>

> 3) write only to the history table, but have an INSERT trigger to update the

> table with "current" data. This has the same performance characteristics as

> 1, but let's you design your application like 2.

Excellent idea!

> I think I'd choose this approach (or 2), since it can handle out-of-order or

> delayed arrival of sensor readings gracefully (assuming they are timestamped

> at source).

It seems like your approach is currently the winner.

> If you go with 2, I'd recommend to still create a view to encapsulate the

> complex query for the current values, to make the application development

> simpler. And if it gets slow, you can easily swap the view with a table,

> updated with triggers or periodically, without changing the application.

>

>> The volume of sensor data is potentially huge, on the order of 500,000

>> updates per hour. Sensor data is few numeric(15,5) numbers.

>

> Whichever design you choose, you should also consider partitioning the data.

I'll look into it, but we'll first see if we can get away with

limiting the time the data needs to be available.

--

Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-performance