On Fri, Nov 21, 2008 at 3:12 PM, Michal Szymanski <dyrex@xxxxxxxxxxxxxx> wrote: > On 21 Lis, 13:50, ciprian.crac...@xxxxxxxxx ("Ciprian Dorin Craciun") > wrote: >> Hello all! >> >> I would like to ask some advice about the following problem >> (related to the Dehems project:http://www.dehems.eu/): >> * there are some clients; (the clients are in fact house holds;) >> * each device has a number of sensors (about 10), and not all the >> clients have the same sensor; also sensors might appear and disappear >> dynamicaly; (the sensors are appliances;) >> * for each device and each sensor a reading is produced (at about >> 6 seconds); (the values could be power consumptions;) >> * I would like to store the following data: (client, sensor, >> timestamp, value); >> * the usual queries are: >> * for a given client (and sensor), and time interval, I need >> the min, max, and avg of the values; >> * for a given time interval (and sensor), I need min, max, and >> avg of the values; >> * other statistics; > > How many devices you expect ? > As I understand number of expected is more or less: > no.of devices * no.sensors (about 10) > every 6second. Let assume that you have 100 devices it means 1000 > inserts per 6s = 166 insert for 1 seconds. Yes, the figures are like this: * average number of raw inserts / second (without any optimization or previous aggregation): #clients (~ 100 thousand) * #sensors (~ 10) / 6seconds = 166 thousand inserts / second... * if I use sharding this number vould drop linearly with the number of Postgres instances... so let's say I use about 10 thousand users / Postgres instance => 16 thousand inserts / second... (a figure which I wasn't able to reach in my Postgres benchmarks...) Either way, I would expect at least 2-3 thousand inserts per second... >> * inserts are done like this: >> * generated 100 million readings by using the following rule: >> * client is randomly chosen between 0 and 10 thousand; >> * sensor is randomly chosen between 0 and 10; >> * the timestamp is always increasing by one; >> * the insert is done in batches of 500 thousand inserts (I've >> also tried 5, 25, 50 and 100 thousand without big impact); >> * the banch inserts are done through COPY sds_benchmark_data >> FROM STDIN through libpq (by using UNIX (local) sockets); > >> What have I observed / tried: >> * I've tested without the primary key and the index, and the >> results were the best for inserts (600k inserts / s), but the >> readings, worked extremly slow (due to the lack of indexing); >> * with only the index (or only the primary key) the insert rate is >> good at start (for the first 2 million readings), but then drops to >> about 200 inserts / s; > > > Try periodicaly execute REINDEX your index, and execute ANALYZE for > your table . To be honest should not influance on inserts but will > influance on select. I'll keep this in mind when I'll reach the select part... For the moment I'm strugling with inserts... (Actually I've kind of given up...) > Michal Szymanski > http://blog.szymanskich.net > > -- > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general Thanks, Ciprian Craciun. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general