Re: Storing sensor data

Ivan Voras <ivoras@xxxxxxxxxxx> · Thu, 28 May 2009 17:06:31 +0200

2009/5/28 Alexander Staubo <alex@xxxxxxxxxx>:
> On Thu, May 28, 2009 at 2:54 PM, Ivan Voras <ivoras@xxxxxxxxxxx> wrote:
>> The volume of sensor data is potentially huge, on the order of 500,000
>> updates per hour. Sensor data is few numeric(15,5) numbers.
>
> The size of that dataset, combined with the apparent simplicity of
> your schema and the apparent requirement for most-sequential access
> (I'm guessing about the latter two),

Your guesses are correct, except every now and then a random value
indexed on a timestamp needs to be retrieved.

> all lead me to suspect you would
> be happier with something other than a traditional relational
> database.
>
> I don't know how exact your historical data has to be. Could you get

No "lossy" compression is allowed. Exact data is needed for the whole dataset-

> If you require precise data with the ability to filter, aggregate and
> correlate over multiple dimensions, something like Hadoop -- or one of
> the Hadoop-based column database implementations, such as HBase or
> Hypertable -- might be a better option, combined with MapReduce/Pig to
> execute analysis jobs

This looks like an interesting idea to investigate. Do you have more
experience with such databases? How do they fare with the following
requirements:

* Storing large datasets (do they pack data well in the database? No
wasted space like in e.g. hash tables?)
* Retrieving specific random records based on a timestamp or record ID?
* Storing "inifinite" datasets (i.e. whose size is not known in
advance - cf. e.g. hash tables)

On the other hand, we could periodically transfer data from PostgreSQL
into a simpler database (e.g. BDB) for archival purposes (at the
expense of more code). Would they be better suited?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance