Re: Linux OOM-Killer

basti <mailinglist@xxxxxxxxxxxxxxxx> · Mon, 17 Mar 2014 12:21:30 +0100

uname -a
Linux h2085616 3.2.0-3-amd64 #1 SMP Mon Jul 23 02:45:17 UTC 2012 x86_64
GNU/Linux

At any time there are not more than 20-30 Connections at once.

Swap is disabled.
free -m
             total       used       free     shared    buffers     cached
Mem:         32215      16163      16051          0         40      14842
-/+ buffers/cache:       1281      30934
Swap:            0          0          0

With the updates there is a little bit tricky:
https://wiki.postgresql.org/wiki/Nov2013ReplicationIssue

If you are currently using 9.2.4, 9.1.9 or 9.0.13 and use any form of
builtin replication do not install the most recent update. Instead, wait
for the next update (9.2.6, 9.1.11 and 9.0.15) to come out.

Options for users who have already updated, or are running 9.3, include:

    if you are using 9.2.5, 9.1.10 or 9.0.14, downgrade your replica
servers to the prior update release (9.2.4, 9.1.9 or 9.0.13).

On 17.03.2014 12:12, Tomas Vondra wrote:
> Hi,
> 
> On 17 Březen 2014, 11:45, basti wrote:
>> Hello,
>>
>> we have a database master Version:
>> PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
>> 4.7.2-2) 4.7.2, 64-bit
>> and a WAL-Replication Slave with hot-standby version:
>> PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
>> 4.7.2-5) 4.7.2, 64-bit.
> 
> You're missing >18 months of fixes on the master (slightly less on the
> slave).
> 
>> Since a few days we had problems with the Linux OOM-Killer.
>> Some simple query that normally take around 6-7 minutes now takes 5 hours.
>> We did not change any configuration values the last days.
>>
>> First of all I have set
>>
>> vm.overcommit_memory=2
>> vm.overcommit_ratio=80
>>
>> Here are some values of my DB-Master config, the Server has 32GB RAM and
>> is only for database, no other service.
>> Did anybody see some mistakes?
> 
> How much swap do you have?
> 
>> I'am not shure if work_mem, maintenance_work_mem and
>> effective_cache_size is set correct.
> 
> That's hard to say. I don't see any immediate issue there, but it really
> depends on your application. For example 200 connections with
> work_mem=192MB may be dangerous if many connections are active at the same
> time.
> 
>>
>> /etc/postgresql/9.1/main/postgresql.conf
>>
>> max_connections = 200
>> ssl = true
>> shared_buffers = 6GB			# min 128kB
>> work_mem = 192MB			# min 64kB
>> maintenance_work_mem = 1GB		# min 1MB
>> wal_level = hot_standby
>> synchronous_commit = off
>> wal_buffers = 16MB
>> checkpoint_segments = 16
>> checkpoint_completion_target = 0.9
>> archive_mode = on
>> archive_command = 'rsync -a %p -e "ssh -i
>> /var/lib/postgresql/.ssh/id_rsa"
>> postgres@my_postgres_slave:/var/lib/postgresql/9.1/wals/dolly_main/%f
>> </dev/null'
>> max_wal_senders = 1
>> wal_keep_segments = 32
>> random_page_cost = 2.0
>> effective_cache_size = 22GB
>> default_statistics_target = 100
>> constraint_exclusion = off
>> join_collapse_limit = 1
>> logging_collector = on
>> log_directory = 'pg_log'
>> log_filename = 'postgresql-%Y-%m-%d.log'
>> log_min_duration_statement = 40000
>> log_lock_waits = on
>> track_counts = on
>> autovacuum = on
>> log_autovacuum_min_duration = 5000
>> autovacuum_max_workers = 4
>> datestyle = 'iso, dmy'
>> deadlock_timeout = 1s
> 
> So what does the query do? Show us explain plan (explain analyze would be
> nice, but if it's running so slow).
> 
> Which kernel is this? When the OOM strikes, it should print detailed into
> into the log - what does it say?
> 
> When you look at "top" output, which processes consume most memory? Are
> there multiple backends consuming a lot of memory? What queries are they
> running?
> 
> Assuming you have a monitoring system in place, collecting memory stats
> (you should have that), what does it say about history? Is there a sudden
> increase in consumed memory or something suspicious?
> 
> regards
> Tomas
> 

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general