Re: ERROR: out of memory | with 23GB cached 7GB reserved on 30GB machine

Montana Low <montanalow@xxxxxxxxx> · Tue, 21 Oct 2014 23:23:56 -0700

increasing overcommit_ratio to 95 solved the problem, the box is now using it's memory as expected without needing to resort to swap.

On Tue, Oct 21, 2014 at 3:55 PM, Montana Low <montanalow@xxxxxxxxx> wrote:
I didn't realize that about overcommit_ratio. It was at 50, I've changed it to 95. I'll see if that clears up the problem moving forward.

# cat /proc/meminfo
MemTotal:       30827220 kB
MemFree:          153524 kB
MemAvailable:   17941864 kB
Buffers:            6188 kB
Cached:         24560208 kB
SwapCached:            0 kB
Active:         20971256 kB
Inactive:        8538660 kB
Active(anon):   12460680 kB
Inactive(anon):    36612 kB
Active(file):    8510576 kB
Inactive(file):  8502048 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:             50088 kB
Writeback:           160 kB
AnonPages:       4943740 kB
Mapped:          7571496 kB
Shmem:           7553176 kB
Slab:             886428 kB
SReclaimable:     858936 kB
SUnreclaim:        27492 kB
KernelStack:        4208 kB
PageTables:       188352 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    15413608 kB
Committed_AS:   14690544 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       59012 kB
VmallocChunk:   34359642367 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:    31465472 kB
DirectMap2M:           0 kB

# sysctl -a:

vm.admin_reserve_kbytes = 8192
vm.block_dump = 0
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.drop_caches = 0
vm.extfrag_threshold = 500
vm.hugepages_treat_as_movable = 0
vm.hugetlb_shm_group = 0
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256	256	32
vm.max_map_count = 65530
vm.min_free_kbytes = 22207
vm.min_slab_ratio = 5
vm.min_unmapped_ratio = 1
vm.mmap_min_addr = 4096
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0
vm.nr_pdflush_threads = 0
vm.numa_zonelist_order = default
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 2
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.panic_on_oom = 0
vm.percpu_pagelist_fraction = 0
vm.scan_unevictable_pages = 0
vm.stat_interval = 1
vm.swappiness = 0
vm.user_reserve_kbytes = 131072
vm.vfs_cache_pressure = 100
vm.zone_reclaim_mode = 0

On Tue, Oct 21, 2014 at 3:46 PM, Tomas Vondra <tv@xxxxxxxx> wrote:
>
> Dne 22 Říjen 2014, 0:25, Montana Low napsal(a):
> > I'm running postgres-9.3 on a 30GB ec2 xen instance w/ linux kernel
> > 3.16.3.
> > I receive numerous Error: out of memory messages in the log, which are
> > aborting client requests, even though there appears to be 23GB available
> > in
> > the OS cache.
> >
> > There is no swap on the box. Postgres is behind pgbouncer to protect from
> > the 200 real clients, which limits connections to 32, although there are
> > rarely more than 20 active connections, even though postgres
> > max_connections is set very high for historic reasons. There is also a 4GB
> > java process running on the box.
> >
> >
> >
> >
> > relevant postgresql.conf:
> >
> > max_connections = 1000                  # (change requires restart)
> > shared_buffers = 7GB                    # min 128kB
> > work_mem = 40MB                         # min 64kB
> > maintenance_work_mem = 1GB              # min 1MB
> > effective_cache_size = 20GB
> >
> >
> >
> > sysctl.conf:
> >
> > vm.swappiness = 0
> > vm.overcommit_memory = 2
>
> This means you have 'no overcommit', so the amount of memory is limited by
> overcommit_ratio + swap. The default value for overcommit_ratio is 50%
> RAM, and as you have no swap that effectively means only 50% of the RAM is
> available to the system.
>
> If you want to verify this, check /proc/meminfo - see the lines
> CommitLimit (the current limit) and Commited_AS (committed address space).
> Once the committed_as reaches the limit, it's game over.
>
> There are different ways to fix this, or at least improve that:
>
> (1) increasing the overcommit_ratio (clearly, 50% is way too low -
> something 90% might be more appropriate on 30GB RAM without swap)
>
> (2) adding swap (say a small ephemeral drive, with swappiness=10 or
> something like that)
>
> Tomas
>