On Fri, Nov 21, 2008 at 3:18 PM, Grzegorz Jaśkiewicz <gryzman@xxxxxxxxx> wrote:> you'll have to provide us with some sort of test-case to get some answers,> please. (set of scripts, queries, etc). Bellow is the content of my original post. Inside I mentionexactly the may the benchmark was conducted. In short the data is inserted by using COPY sds_benchmark_datafrom STDIN, in batches of 500 thousand data points. I'll also paste the important part here: > * Postgres version: 8.3.3;>> * database schema:>> create table sds_benchmark_data (>> client int4 not null,>> sensor int4 not null,>> timestamp int4 not null,>> value int4>> );>>>> alter table sds_benchmark_data add primary key (client, sensor, timestamp);>>>> create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor);>> * postgres.conf (the default values, I will list only what has> been changed):>> max_connections = 20>> shared_buffers = 24MB>> work_mem = 64MB>> maintenance_work_mem = 256MB>> fsync = off>> checkpoint_segments = 32>> effective_cache_size = 1024MB>> * inserts are done like this:> * generated 100 million readings by using the following rule:> * client is randomly chosen between 0 and 10 thousand;> * sensor is randomly chosen between 0 and 10;> * the timestamp is always increasing by one;> * the insert is done in batches of 500 thousand inserts (I've> also tried 5, 25, 50 and 100 thousand without big impact);> * the banch inserts are done through COPY sds_benchmark_data> FROM STDIN through libpq (by using UNIX (local) sockets); Unfortunately I don't know what more information to give... Thanks, Ciprian Dorin Craciun. On Fri, Nov 21, 2008 at 2:50 PM, Ciprian Dorin Craciun<ciprian.craciun@xxxxxxxxx> wrote:> Hello all!>> I would like to ask some advice about the following problem> (related to the Dehems project: http://www.dehems.eu/ ):> * there are some clients; (the clients are in fact house holds;)> * each device has a number of sensors (about 10), and not all the> clients have the same sensor; also sensors might appear and disappear> dynamicaly; (the sensors are appliances;)> * for each device and each sensor a reading is produced (at about> 6 seconds); (the values could be power consumptions;)> * I would like to store the following data: (client, sensor,> timestamp, value);> * the usual queries are:> * for a given client (and sensor), and time interval, I need> the min, max, and avg of the values;> * for a given time interval (and sensor), I need min, max, and> avg of the values;> * other statistics;>> Currently I'm benchmarking the following storage solutions for this:> * Hypertable (http://www.hypertable.org/) -- which has good insert> rate (about 250k inserts / s), but slow read rate (about 150k reads /> s); (the aggregates are manually computed, as Hypertable does not> support other queries except scanning (in fact min, and max are easy> beeing the first / last key in the ordered set, but avg must be done> by sequential scan);)> * BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but> fabulos read rate (about 2M reads / s); (the same issue with> aggregates;)> * Postgres -- which behaves quite poorly (see below)...> * MySQL -- next to be tested;>> So what can I do / how could I optimize the use of Postgres for this usage?>> (I'm aware that there could be optimizations for this problem> (like computing the aggregates in memory and storing only these> aggregates at 10 minutes, or other interval), but I want to have the> full history (for data mining tasks for example);)>> I will also like to present how was the Postgres benchmark implemented:>> * test machine: Linux (Ubuntu 8.04 x64), IBM x3750, 2 x 500Gb,> SCSI, RAID 0, 8Gb RAM, 2 x Intel Xeon 5160 x 2 core (3Ghz, 4Mb L2);> * Postgres version: 8.3.3;>> * database schema:>> create table sds_benchmark_data (>> client int4 not null,>> sensor int4 not null,>> timestamp int4 not null,>> value int4>> );>>>> alter table sds_benchmark_data add primary key (client, sensor, timestamp);>>>> create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor);>> * postgres.conf (the default values, I will list only what has> been changed):>> max_connections = 20>> shared_buffers = 24MB>> work_mem = 64MB>> maintenance_work_mem = 256MB>> fsync = off>> checkpoint_segments = 32>> effective_cache_size = 1024MB>> * inserts are done like this:> * generated 100 million readings by using the following rule:> * client is randomly chosen between 0 and 10 thousand;> * sensor is randomly chosen between 0 and 10;> * the timestamp is always increasing by one;> * the insert is done in batches of 500 thousand inserts (I've> also tried 5, 25, 50 and 100 thousand without big impact);> * the banch inserts are done through COPY sds_benchmark_data> FROM STDIN through libpq (by using UNIX (local) sockets);>> What have I observed / tried:> * I've tested without the primary key and the index, and the> results were the best for inserts (600k inserts / s), but the> readings, worked extremly slow (due to the lack of indexing);> * with only the index (or only the primary key) the insert rate is> good at start (for the first 2 million readings), but then drops to> about 200 inserts / s;>> So could someone point me where I'me wrong, or what can I do to> optimize Postgres for this particular task?>> Thanks for your help,> Ciprian Dorin Craciun.>> P.S.: I'll want to publish the benchmark results after they are> done, and I want to squeeze as much power out of Postgres as possible. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general