Le 19/08/2013 10:07, Dzmitry a écrit :
Hey folks,
I have postgres server running on ubuntu 12,Intel Xeon 8 CPUs 29 GB RAM.
With following settings:
max_connections = 550
shared_buffers = 12GB
temp_buffers = 8MB
max_prepared_transactions = 0
work_mem = 50MB
maintenance_work_mem = 1GB
fsync = on
wal_buffers = 16MB
commit_delay = 50
commit_siblings = 7
checkpoint_segments = 32
checkpoint_completion_target = 0.9
effective_cache_size = 22GB
autovacuum = on
autovacuum_vacuum_threshold = 1800
autovacuum_analyze_threshold = 900
I am doing a lot of writes to DB in 40 different threads – so every thread
check if record exists – if not => insert record, if exists => update record.
During this update, my disk IO almost always – 100% and sometimes it crash my
DB with following message:
2013-08-19 03:18:00 UTC LOG: checkpointer process (PID 28354) was terminated
by signal 9: Killed
2013-08-19 03:18:00 UTC LOG: terminating any other active server processes
2013-08-19 03:18:00 UTC WARNING: terminating connection because of crash of
another server process
2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another server
process exited abnormally and possibly corrupted shared memory.
2013-08-19 03:18:00 UTC HINT: In a moment you should be able to reconnect to
the database and repeat your command.
2013-08-19 03:18:00 UTC WARNING: terminating connection because of crash of
another server process
2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another server
process exited abnormally and possibly corrupted shared memory.
2013-08-19 03:18:00 UTC HINT: In a moment you should be able to reconnect to
the database and repeat your command.
2013-08-19 03:18:00 UTC WARNING: terminating connection because of crash of
another server process
2013-08-19 03:18:00 UTC DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another server
process exited abnormally and possibly corrupted shared memory.
My DB size is not very big – 169GB.
Anyone know how can I get rid of DB crash ?
Thanks,
Dzmitry
The fact that the checkpointer was killed -9 let me think the OOMKiller has
detected you were out of memory.
Could that be the case?
12GB of shared_buffers on a 29Gb box is too high. You should try to lower that
value to 6GB, for instance.
550*50MB, that is 27GB of RAM that PostgreSQL could try to adress.
I can imagine your system is swapping a lot, and you exhaust swap memory before
crash.
Regards,
--
Stéphane Schildknecht
Loxodata - Conseil, expertise et formations
--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin