Re: Using Postgres to store high volume streams of sensor readings

"Ciprian Dorin Craciun" <ciprian.craciun@xxxxxxxxx> · Sat, 22 Nov 2008 17:16:16 +0200



On Fri, Nov 21, 2008 at 3:12 PM, Michal Szymanski <dyrex@xxxxxxxxxxxxxx> wrote:
> On 21 Lis, 13:50, ciprian.crac...@xxxxxxxxx ("Ciprian Dorin Craciun")
> wrote:
>>     Hello all!
>>
>>     I would like to ask some advice about the following problem
>> (related to the Dehems project:http://www.dehems.eu/):
>>     * there are some clients; (the clients are in fact house holds;)
>>     * each device has a number of sensors (about 10), and not all the
>> clients have the same sensor; also sensors might appear and disappear
>> dynamicaly; (the sensors are appliances;)
>>     * for each device and each sensor a reading is produced (at about
>> 6 seconds); (the values could be power consumptions;)
>>     * I would like to store the following data: (client, sensor,
>> timestamp, value);
>>     * the usual queries are:
>>         * for a given client (and sensor), and time interval, I need
>> the min, max, and avg of the values;
>>         * for a given time interval (and sensor), I need min, max, and
>> avg of the values;
>>         * other statistics;
>
> How many devices you expect ?
> As I understand number of expected is more or less:
> no.of devices * no.sensors (about 10)
> every 6second. Let assume that you have 100 devices it means 1000
> inserts per 6s = 166 insert for 1 seconds.

    Yes, the figures are like this:
    * average number of raw inserts / second (without any optimization
or previous aggregation): #clients (~ 100 thousand) * #sensors (~ 10)
/ 6seconds = 166 thousand inserts / second...
    * if I use sharding this number vould drop linearly with the
number of Postgres instances... so let's say I use about 10 thousand
users / Postgres instance => 16 thousand inserts / second... (a figure
which I wasn't able to reach in my Postgres benchmarks...)

    Either way, I would expect at least 2-3 thousand inserts per second...


>>     * inserts are done like this:
>>         * generated 100 million readings by using the following rule:
>>             * client is randomly chosen between 0 and 10 thousand;
>>             * sensor is randomly chosen between 0 and 10;
>>             * the timestamp is always increasing by one;
>>         * the insert is done in batches of 500 thousand inserts (I've
>> also tried 5, 25, 50 and 100 thousand without big impact);
>>         * the banch inserts are done through COPY sds_benchmark_data
>> FROM STDIN through libpq (by using UNIX (local) sockets);
>
>>     What have I observed / tried:
>>     * I've tested without the primary key and the index, and the
>> results were the best for inserts (600k inserts / s), but the
>> readings, worked extremly slow (due to the lack of indexing);
>>     * with only the index (or only the primary key) the insert rate is
>> good at start (for the first 2 million readings), but then drops to
>> about 200 inserts / s;
>
>
> Try periodicaly execute REINDEX your index, and execute ANALYZE for
> your table . To be honest should not influance on inserts but will
> influance on select.

    I'll keep this in mind when I'll reach the select part... For the
moment I'm strugling with inserts... (Actually I've kind of given
up...)


> Michal Szymanski
> http://blog.szymanskich.net
>
> --
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


    Thanks,
    Ciprian Craciun.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general