Re: postgres crashes on insert in 40 different threads

Stéphane Schildknecht <stephane.schildknecht@xxxxxxxxxxxxx> · Mon, 19 Aug 2013 15:23:34 +0200

Le 19/08/2013 15:06, Dzmitry a écrit :
Thank you guys ! Found in logs(db-slave1 is my replica  that use streaming
replication):

Aug 18 15:49:38 db-slave1 kernel: [25094456.525703] postgres invoked
oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Aug 18 15:49:38 db-slave1 kernel: [25094456.525708] postgres cpuset=/
mems_allowed=0
Aug 18 15:49:38 db-slave1 kernel: [25094456.525712] Pid: 26418, comm:
postgres Not tainted 3.2.0-40-virtual #64-Ubuntu

Aug 18 15:49:48 db-slave1 kernel: [25094456.621937] Out of memory: Kill
process 2414 (postgres) score 417 or sacrifice child
Aug 18 15:49:48 db-slave1 kernel: [25094456.621949] Killed process 2414
(postgres) total-vm:13057464kB, anon-rss:28560kB, file-rss:12773376kB

Aug 19 03:18:00 db-slave1 kernel: [25135758.540539] postgres invoked
oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
Aug 19 03:18:01 db-slave1 kernel: [25135758.540544] postgres cpuset=/
mems_allowed=0
Aug 19 03:18:07 db-slave1 kernel: [25135758.540548] Pid: 23994, comm:
postgres Not tainted 3.2.0-40-virtual #64-Ubuntu

Aug 19 03:18:07 db-slave1 kernel: [25135758.626405] Out of memory: Kill
process 28354 (postgres) score 348 or sacrifice child

Aug 19 03:18:07 db-slave1 kernel: [25135758.626418] Killed process 28354
(postgres) total-vm:13021248kB, anon-rss:8704kB, file-rss:10686512kB
Aug 19 03:18:07 db-slave1 kernel: [25135763.068736] postgres invoked
oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Aug 19 03:18:07 db-slave1 kernel: [25135763.068740] postgres cpuset=/
mems_allowed=0
Aug 19 03:18:07 db-slave1 kernel: [25135763.068743] Pid: 6780, comm:
postgres Not tainted 3.2.0-40-virtual #64-Ubuntu

Aug 19 03:18:07 db-slave1 kernel: [25135763.150285] Out of memory: Kill
process 20322 (postgres) score 301 or sacrifice child
Aug 19 03:18:07 db-slave1 kernel: [25135763.150297] Killed process 20322
(postgres) total-vm:13058892kB, anon-rss:47172kB, file-rss:9186604kB

So I will do as Stéphane advised - make shared buffers 6GB. Do you know if
I need to do anything else - increase shared memory(SHMMAX/SHMMIN)
parameters ?

Right now I have
Shhmax - 13223870464
Shmall - 4194304

Thanks,
   Dzmitry

On 8/19/13 1:05 PM, "Albe Laurenz" <laurenz.albe@xxxxxxxxxx> wrote:

Dzmitry wrote:
On 8/19/13 11:36 AM, "Stéphane Schildknecht"
<stephane.schildknecht@xxxxxxxxxxxxx> wrote:
Le 19/08/2013 10:07, Dzmitry a écrit :
  I have postgres server running on ubuntu 12,Intel Xeon 8 CPUs 29 GB
RAM.
With following settings:
max_connections = 550
shared_buffers = 12GB
temp_buffers = 8MB
max_prepared_transactions = 0
work_mem = 50MB
maintenance_work_mem = 1GB
fsync = on
wal_buffers = 16MB
commit_delay = 50
commit_siblings = 7
checkpoint_segments = 32
checkpoint_completion_target = 0.9
effective_cache_size = 22GB
autovacuum = on
autovacuum_vacuum_threshold = 1800
autovacuum_analyze_threshold = 900

I am doing a lot of writes to DB in 40 different threads  so every
thread
check if record exists  if not => insert record, if exists =>
update
record.
During this update, my disk IO almost always  100% and sometimes it
crash my
DB with following message:

2013-08-19 03:18:00 UTC LOG:  checkpointer process (PID 28354) was
terminated by signal 9: Killed
[...]
My DB size is not very big  169GB.

Anyone know how can I get rid of DB crash  ?
The fact that the checkpointer was killed -9 let me think the
OOMKiller has
detected you were out of memory.

Could that be the case?

12GB of shared_buffers on a 29Gb box is too high. You should try to
lower that
value to 6GB, for instance.
550*50MB, that is 27GB of RAM that PostgreSQL could try to adress.

I can imagine your system is swapping a lot, and you exhaust swap
memory before crash.
I don't think it's the case. I am using newrelic for monitoring my DB
servers(I have one master and 2 slaves - all use the same
configuration) -
memory is not going above 12.5GB, so I have a good reserve, also I
don't
see any swapping there :(
You can check by examining /var/log/messages to see if the OOM
killer is at fault, which is highly likely.

The OOM killer uses heuristics, so it does the wrong thing
occasionally.

The documentation is helpful:

http://www.postgresql.org/docs/9.2/static/kernel-resources.html#LINUX-ME
MORY-OVERCOMMIT
Do you mean postgres log file(in postgres.conf)

log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_min_messages = warning

Or /var/log/messages ? Because I haven't this file :(
I meant the latter.
/var/log/messages is just where syslog output is directed on some
Linux distributions.  I don't know Ubuntu, so sorry if I got
it wrong.  Maybe it is /var/log/syslog on Ubuntu.
In case of doubt check your syslog configuration.

Yours,
Laurenz Albe

As Laurenz said, you should have a look at documentation.

It explains how you can lower the risk OOMKiller kills your PostgreSQL processes.
1. You can set vm.overcommit_memory to 2 in sysctl.conf
2. You can adjust the value of oom_score_adj in startup script to prevent 
OOMKiller to kill Postmaster
3. You can lower shared_buffers, work_mem and max_connections.

Regards,

--
Stéphane Schildknecht
Loxodata - Conseil, expertise et formations

--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin