Re: postgres crashes on insert in 40 different threads

Dzmitry <dzmitry.nikitsin@xxxxxxxxx> · Mon, 19 Aug 2013 16:06:16 +0300

Thank you guys ! Found in logs(db-slave1 is my replica  that use streaming
replication):

Aug 18 15:49:38 db-slave1 kernel: [25094456.525703] postgres invoked
oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Aug 18 15:49:38 db-slave1 kernel: [25094456.525708] postgres cpuset=/
mems_allowed=0
Aug 18 15:49:38 db-slave1 kernel: [25094456.525712] Pid: 26418, comm:
postgres Not tainted 3.2.0-40-virtual #64-Ubuntu

Aug 18 15:49:48 db-slave1 kernel: [25094456.621937] Out of memory: Kill
process 2414 (postgres) score 417 or sacrifice child
Aug 18 15:49:48 db-slave1 kernel: [25094456.621949] Killed process 2414
(postgres) total-vm:13057464kB, anon-rss:28560kB, file-rss:12773376kB

Aug 19 03:18:00 db-slave1 kernel: [25135758.540539] postgres invoked
oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
Aug 19 03:18:01 db-slave1 kernel: [25135758.540544] postgres cpuset=/
mems_allowed=0
Aug 19 03:18:07 db-slave1 kernel: [25135758.540548] Pid: 23994, comm:
postgres Not tainted 3.2.0-40-virtual #64-Ubuntu

Aug 19 03:18:07 db-slave1 kernel: [25135758.626405] Out of memory: Kill
process 28354 (postgres) score 348 or sacrifice child

Aug 19 03:18:07 db-slave1 kernel: [25135758.626418] Killed process 28354
(postgres) total-vm:13021248kB, anon-rss:8704kB, file-rss:10686512kB
Aug 19 03:18:07 db-slave1 kernel: [25135763.068736] postgres invoked
oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Aug 19 03:18:07 db-slave1 kernel: [25135763.068740] postgres cpuset=/
mems_allowed=0
Aug 19 03:18:07 db-slave1 kernel: [25135763.068743] Pid: 6780, comm:
postgres Not tainted 3.2.0-40-virtual #64-Ubuntu

Aug 19 03:18:07 db-slave1 kernel: [25135763.150285] Out of memory: Kill
process 20322 (postgres) score 301 or sacrifice child
Aug 19 03:18:07 db-slave1 kernel: [25135763.150297] Killed process 20322
(postgres) total-vm:13058892kB, anon-rss:47172kB, file-rss:9186604kB

So I will do as Stéphane advised - make shared buffers 6GB. Do you know if
I need to do anything else - increase shared memory(SHMMAX/SHMMIN)
parameters ?

Right now I have 
Shhmax - 13223870464
Shmall - 4194304

Thanks,
  Dzmitry

On 8/19/13 1:05 PM, "Albe Laurenz" <laurenz.albe@xxxxxxxxxx> wrote:

>>> Dzmitry wrote:
>>>> On 8/19/13 11:36 AM, "Stéphane Schildknecht"
>>>><stephane.schildknecht@xxxxxxxxxxxxx> wrote:
>>>>> Le 19/08/2013 10:07, Dzmitry a écrit :
>>>>>>  I have postgres server running on ubuntu 12,Intel Xeon 8 CPUs 29 GB
>>>>>>RAM.
>>>>>> With following settings:
>>>>>> max_connections = 550
>>>>>> shared_buffers = 12GB
>>>>>> temp_buffers = 8MB
>>>>>> max_prepared_transactions = 0
>>>>>> work_mem = 50MB
>>>>>> maintenance_work_mem = 1GB
>>>>>> fsync = on
>>>>>> wal_buffers = 16MB
>>>>>> commit_delay = 50
>>>>>> commit_siblings = 7
>>>>>> checkpoint_segments = 32
>>>>>> checkpoint_completion_target = 0.9
>>>>>> effective_cache_size = 22GB
>>>>>> autovacuum = on
>>>>>> autovacuum_vacuum_threshold = 1800
>>>>>> autovacuum_analyze_threshold = 900
>>>>>>
>>>>>> I am doing a lot of writes to DB in 40 different threads  so every
>>>>>>thread
>>>>>> check if record exists  if not => insert record, if exists =>
>>>>>>update
>>>>>>record.
>>>>>> During this update, my disk IO almost always  100% and sometimes it
>>>>>>crash my
>>>>>> DB with following message:
>>>>>>
>>>>>> 2013-08-19 03:18:00 UTC LOG:  checkpointer process (PID 28354) was
>>>>>>terminated by signal 9: Killed
>>>[...]
>>>>>>
>>>>>> My DB size is not very big  169GB.
>>>>>>
>>>>>> Anyone know how can I get rid of DB crash  ?
>>>
>>>>> The fact that the checkpointer was killed -9 let me think the
>>>>>OOMKiller has
>>>>> detected you were out of memory.
>>>>>
>>>>> Could that be the case?
>>>>>
>>>>> 12GB of shared_buffers on a 29Gb box is too high. You should try to
>>>>>lower that
>>>>> value to 6GB, for instance.
>>>>> 550*50MB, that is 27GB of RAM that PostgreSQL could try to adress.
>>>>>
>>>>> I can imagine your system is swapping a lot, and you exhaust swap
>>>>>memory before crash.
>>>
>>>> I don't think it's the case. I am using newrelic for monitoring my DB
>>>> servers(I have one master and 2 slaves - all use the same
>>>>configuration) -
>>>> memory is not going above 12.5GB, so I have a good reserve, also I
>>>>don't
>>>> see any swapping there :(
>>>
>>> You can check by examining /var/log/messages to see if the OOM
>>> killer is at fault, which is highly likely.
>>>
>>> The OOM killer uses heuristics, so it does the wrong thing
>>>occasionally.
>>>
>>> The documentation is helpful:
>>> 
>>>http://www.postgresql.org/docs/9.2/static/kernel-resources.html#LINUX-ME
>>>MORY-OVERCOMMIT
>
>> Do you mean postgres log file(in postgres.conf)
>> 
>> log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
>> log_min_messages = warning
>> 
>> Or /var/log/messages ? Because I haven't this file :(
>
>I meant the latter.
>/var/log/messages is just where syslog output is directed on some
>Linux distributions.  I don't know Ubuntu, so sorry if I got
>it wrong.  Maybe it is /var/log/syslog on Ubuntu.
>In case of doubt check your syslog configuration.
>
>Yours,
>Laurenz Albe

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin