Postgresql Host Swapping Hard With Abundant Free Memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greetings,

We have been having some pretty painful site outages related to heavy swapping, even though the system may appear to have as much as 10GB of free memory.
It seems that as soon as the system hits ~22GB (of 32GB) of memory usage it starts to swap.  As soon as we go below ~22GB, swap is released.  

During the worst outages we see:

heavy swapping      (10-15GB)
heavy disk IO       (up to 600ms)
heavy CPU load: 4.0 (load over 100+ with 26 cores)
available memory:   (6-8GB)

Here's the host information, it's a dedicated physical host, not a VM.

ubuntu-14.04.1 LTS
Linux db004 3.13.0-34-generic #60-Ubuntu SMP Wed Aug 13 15:45:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
24 cores, 32GB RAM, 32GB swapfile
RAID 10 with SSDs

Postgres version:

postgresql-9.3.5
postgis-2.1.4 extension

Here's our /etc/sysctl.conf:

kernel.shmmax = 16850395136
kernel.shmall = 4113866
vm.overcommit_memory = 2
vm.overcommit_ratio = 50
vm.swappiness = 0 
vm.dirty_ratio = 10           ## maximum 3.2GB dirty cache size 
vm.dirty_background_ratio = 5 ## ratio at which disk begins to flush cache (1.6GB)

We are setting our shared_buffers, work_mem and maintenance_work_mem very high which I suspected is reason for swapping.
The huge mystery to me is that during heavy swap there appears to be ~8-10GB free memory in cache.
We wondered whether that unused memory in cache was being consumed by our memory settings in postgresql.conf:

shared_buffers = 8GB          ## 25% of system memory (32GB)
maintenance_work_mem = 8GB    ## autovacuuming on, 3 workers (defaults)
work_mem = 256MB              ## we have as many as 170 connections

We have average around 170 connections and have been moving everything to pgbouncer to reduce this count.
Rough estimate of cost with work_mem set to 256MB: 170 x 256MB = 43510MB (43GB)

Is it possible that the high work_mem setting is causing the connections to hold on to the extra available memory?
This is our assumption so we plan on dialing down work_mem and maintenance_work_mem to sane values.
We have also ordered more RAM (increase from 32GB to 96GB) but I would like to understand what is happening.

Here are the values we are going to change:

maintenance_work_mem = 1GB
work_mem = 64MB

One last thing, we disabled THP (tranparent huge pages) because we were seeing compaction errors:

[db003.prod:~] root% egrep 'compact_(fail|stall)' /proc/vmstat
compact_stall 34682729
compact_fail 32915396

Here are our competing theories of why we are swapping as much as 10-15GB with 6GB-10GB of free memory:

1.) maintenance_work_mem and work_mem are set too high causing postgres to allocate too much memory
2.) The 6GB - 10GB of free memory observed during swapping is postgresql's shared buffer 
3.) Linux filesystem caching is tuned incorrectly, maybe SSD related?
4.) This is a NUMA related issue similar to the Mysql Swap Insanity issue:

http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-reclaim-mode.html
http://www.postgresql.org/message-id/CAGTBQpacrSDcN10rTwRbH+AGm2_y0Qao6CJDoyvEp504iFbdrw@xxxxxxxxxxxxxx

Note, we zone_reclaim_mode appears disabled on our database host.

root% cat /proc/sys/vm/zone_reclaim_mode
0

Hoping someone here can help us sort this out because it's a huge mystery for us.
We are going to add more RAM but I'm trying to understand what is happening.

Thank you kindly.

Christian Gough




-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance





[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux