I'm running postgres-9.3 on a 30GB ec2 xen instance w/ linux kernel 3.16.3. I receive numerous Error: out of memory messages in the log, which are aborting client requests, even though there appears to be 23GB available in the OS cache.
There is no swap on the box. Postgres is behind pgbouncer to protect from the 200 real clients, which limits connections to 32, although there are rarely more than 20 active connections, even though postgres max_connections is set very high for historic reasons. There is also a 4GB java process running on the box.
relevant postgresql.conf:
max_connections = 1000 # (change requires restart)
shared_buffers = 7GB # min 128kB
work_mem = 40MB # min 64kB
maintenance_work_mem = 1GB # min 1MB
effective_cache_size = 20GB
sysctl.conf:
vm.swappiness = 0
vm.overcommit_memory = 2
kernel.shmmax=34359738368
kernel.shmall=8388608
log example:
ERROR: out of memory
DETAIL: Failed on request of size 67108864.
STATEMENT: SELECT "package_texts".* FROM "package_texts" WHERE "package_texts"."id" = $1 LIMIT 1
example pg_top, showing 23GB available in cache:
last pid: 6607; load avg: 3.59, 2.32, 2.61; up 16+09:17:29 20:49:51
18 processes: 1 running, 17 sleeping
CPU states: 22.5% user, 0.0% nice, 4.9% system, 63.2% idle, 9.4% iowait
Memory: 29G used, 186M free, 7648K buffers, 23G cached
DB activity: 2479 tps, 1 rollbs/s, 217 buffer r/s, 99 hit%, 11994 row r/s, 3820 row w/s
DB I/O: 0 reads/s, 0 KB/s, 0 writes/s, 0 KB/s
DB disk: 149.8 GB total, 46.7 GB free (68% used)
Swap:
example top showing the only other significant 4GB process on the box:
top - 21:05:09 up 16 days, 9:32, 2 users, load average: 2.73, 2.91, 2.88
Tasks: 147 total, 3 running, 244 sleeping, 0 stopped, 0 zombie
%Cpu(s): 22.1 us, 4.1 sy, 0.0 ni, 62.9 id, 9.8 wa, 0.0 hi, 0.7 si, 0.3 st
KiB Mem: 30827220 total, 30642584 used, 184636 free, 7292 buffers
KiB Swap: 0 total, 0 used, 0 free. 23449636 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7407 postgres 20 0 7604928 10172 7932 S 29.6 0.0 2:51.27 postgres
10469 postgres 20 0 7617716 176032 160328 R 11.6 0.6 0:01.48 postgres
10211 postgres 20 0 7630352 237736 208704 S 10.6 0.8 0:03.64 postgres
18202 elastic+ 20 0 8726984 4.223g 4248 S 9.6 14.4 883:06.79 java
9711 postgres 20 0 7619500 354188 335856 S 7.0 1.1 0:08.03 postgres
3638 postgres 20 0 7634552 1.162g 1.127g S 6.6 4.0 0:50.42 postgres
There is no swap on the box. Postgres is behind pgbouncer to protect from the 200 real clients, which limits connections to 32, although there are rarely more than 20 active connections, even though postgres max_connections is set very high for historic reasons. There is also a 4GB java process running on the box.
relevant postgresql.conf:
max_connections = 1000 # (change requires restart)
shared_buffers = 7GB # min 128kB
work_mem = 40MB # min 64kB
maintenance_work_mem = 1GB # min 1MB
effective_cache_size = 20GB
sysctl.conf:
vm.swappiness = 0
vm.overcommit_memory = 2
kernel.shmmax=34359738368
kernel.shmall=8388608
log example:
ERROR: out of memory
DETAIL: Failed on request of size 67108864.
STATEMENT: SELECT "package_texts".* FROM "package_texts" WHERE "package_texts"."id" = $1 LIMIT 1
example pg_top, showing 23GB available in cache:
last pid: 6607; load avg: 3.59, 2.32, 2.61; up 16+09:17:29 20:49:51
18 processes: 1 running, 17 sleeping
CPU states: 22.5% user, 0.0% nice, 4.9% system, 63.2% idle, 9.4% iowait
Memory: 29G used, 186M free, 7648K buffers, 23G cached
DB activity: 2479 tps, 1 rollbs/s, 217 buffer r/s, 99 hit%, 11994 row r/s, 3820 row w/s
DB I/O: 0 reads/s, 0 KB/s, 0 writes/s, 0 KB/s
DB disk: 149.8 GB total, 46.7 GB free (68% used)
Swap:
example top showing the only other significant 4GB process on the box:
top - 21:05:09 up 16 days, 9:32, 2 users, load average: 2.73, 2.91, 2.88
Tasks: 147 total, 3 running, 244 sleeping, 0 stopped, 0 zombie
%Cpu(s): 22.1 us, 4.1 sy, 0.0 ni, 62.9 id, 9.8 wa, 0.0 hi, 0.7 si, 0.3 st
KiB Mem: 30827220 total, 30642584 used, 184636 free, 7292 buffers
KiB Swap: 0 total, 0 used, 0 free. 23449636 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7407 postgres 20 0 7604928 10172 7932 S 29.6 0.0 2:51.27 postgres
10469 postgres 20 0 7617716 176032 160328 R 11.6 0.6 0:01.48 postgres
10211 postgres 20 0 7630352 237736 208704 S 10.6 0.8 0:03.64 postgres
18202 elastic+ 20 0 8726984 4.223g 4248 S 9.6 14.4 883:06.79 java
9711 postgres 20 0 7619500 354188 335856 S 7.0 1.1 0:08.03 postgres
3638 postgres 20 0 7634552 1.162g 1.127g S 6.6 4.0 0:50.42 postgres