I don't think it's the case. I am using newrelic for monitoring my DB servers(I have one master and 2 slaves - all use the same configuration) - memory is not going above 12.5GB, so I have a good reserve, also I don't see any swapping there :( Thanks, Dzmitry On 8/19/13 11:36 AM, "Stéphane Schildknecht" <stephane.schildknecht@xxxxxxxxxxxxx> wrote: >Le 19/08/2013 10:07, Dzmitry a écrit : >> Hey folks, >> I have postgres server running on ubuntu 12,Intel Xeon 8 CPUs 29 GB >>RAM. >> With following settings: >> max_connections = 550 >> shared_buffers = 12GB >> temp_buffers = 8MB >> max_prepared_transactions = 0 >> work_mem = 50MB >> maintenance_work_mem = 1GB >> fsync = on >> wal_buffers = 16MB >> commit_delay = 50 >> commit_siblings = 7 >> checkpoint_segments = 32 >> checkpoint_completion_target = 0.9 >> effective_cache_size = 22GB >> autovacuum = on >> autovacuum_vacuum_threshold = 1800 >> autovacuum_analyze_threshold = 900 >> >> I am doing a lot of writes to DB in 40 different threads so every >>thread >> check if record exists if not => insert record, if exists => update >>record. >> During this update, my disk IO almost always 100% and sometimes it >>crash my >> DB with following message: >> >> 2013-08-19 03:18:00 UTC LOG: checkpointer process (PID 28354) was >>terminated >> by signal 9: Killed >> 2013-08-19 03:18:00 UTC LOG: terminating any other active server >>processes >> 2013-08-19 03:18:00 UTC WARNING: terminating connection because of >>crash of >> another server process >> 2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this >>server >> process to roll back the current transaction and exit, because another >>server >> process exited abnormally and possibly corrupted shared memory. >> 2013-08-19 03:18:00 UTC HINT: In a moment you should be able to >>reconnect to >> the database and repeat your command. >> 2013-08-19 03:18:00 UTC WARNING: terminating connection because of >>crash of >> another server process >> 2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this >>server >> process to roll back the current transaction and exit, because another >>server >> process exited abnormally and possibly corrupted shared memory. >> 2013-08-19 03:18:00 UTC HINT: In a moment you should be able to >>reconnect to >> the database and repeat your command. >> 2013-08-19 03:18:00 UTC WARNING: terminating connection because of >>crash of >> another server process >> 2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this >>server >> process to roll back the current transaction and exit, because another >>server >> process exited abnormally and possibly corrupted shared memory. >> >> My DB size is not very big 169GB. >> >> Anyone know how can I get rid of DB crash ? >> >> >> Thanks, >> Dzmitry >> > >The fact that the checkpointer was killed -9 let me think the OOMKiller >has >detected you were out of memory. > >Could that be the case? > >12GB of shared_buffers on a 29Gb box is too high. You should try to lower >that >value to 6GB, for instance. >550*50MB, that is 27GB of RAM that PostgreSQL could try to adress. > >I can imagine your system is swapping a lot, and you exhaust swap memory >before >crash. > >Regards, > >-- >Stéphane Schildknecht >Loxodata - Conseil, expertise et formations > > > >-- >Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-admin -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin