2009/5/28 Alexander Staubo <alex@xxxxxxxxxx>: > On Thu, May 28, 2009 at 2:54 PM, Ivan Voras <ivoras@xxxxxxxxxxx> wrote: >> The volume of sensor data is potentially huge, on the order of 500,000 >> updates per hour. Sensor data is few numeric(15,5) numbers. > > The size of that dataset, combined with the apparent simplicity of > your schema and the apparent requirement for most-sequential access > (I'm guessing about the latter two), Your guesses are correct, except every now and then a random value indexed on a timestamp needs to be retrieved. > all lead me to suspect you would > be happier with something other than a traditional relational > database. > > I don't know how exact your historical data has to be. Could you get No "lossy" compression is allowed. Exact data is needed for the whole dataset- > If you require precise data with the ability to filter, aggregate and > correlate over multiple dimensions, something like Hadoop -- or one of > the Hadoop-based column database implementations, such as HBase or > Hypertable -- might be a better option, combined with MapReduce/Pig to > execute analysis jobs This looks like an interesting idea to investigate. Do you have more experience with such databases? How do they fare with the following requirements: * Storing large datasets (do they pack data well in the database? No wasted space like in e.g. hash tables?) * Retrieving specific random records based on a timestamp or record ID? * Storing "inifinite" datasets (i.e. whose size is not known in advance - cf. e.g. hash tables) On the other hand, we could periodically transfer data from PostgreSQL into a simpler database (e.g. BDB) for archival purposes (at the expense of more code). Would they be better suited? -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance