Re: debug linux kernel memory management / pressure

Vlastimil Babka <vbabka@xxxxxxx> · Fri, 5 Apr 2019 12:37:49 +0200

On 3/29/19 10:41 AM, Stefan Priebe - Profihost AG wrote:
> Hi,
> 
> nobody an idea? I had another system today:

Well, isn't it still the same thing as we discussed in last autumn?
You did report success with the ill-fated patch "mm: thp:  relax __GFP_THISNODE
for MADV_HUGEPAGE mappings", or not?

> # cat /proc/meminfo
> MemTotal:       131911684 kB
> MemFree:        25734836 kB
> MemAvailable:   78158816 kB
> Buffers:            2916 kB
> Cached:         20650184 kB
> SwapCached:       544016 kB
> Active:         58999352 kB
> Inactive:       10084060 kB
> Active(anon):   43412532 kB
> Inactive(anon):  5583220 kB
> Active(file):   15586820 kB
> Inactive(file):  4500840 kB
> Unevictable:       35032 kB
> Mlocked:           35032 kB
> SwapTotal:       3905532 kB
> SwapFree:              0 kB
> Dirty:              1048 kB
> Writeback:         20144 kB
> AnonPages:      47923392 kB
> Mapped:           775376 kB
> Shmem:            561420 kB
> Slab:           35798052 kB
> SReclaimable:   34309112 kB

That's rather significant. Got a /proc/slabinfo from such system state?

> SUnreclaim:      1488940 kB
> KernelStack:       42160 kB
> PageTables:       248008 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:    69861372 kB
> Committed_AS:   100328892 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:           0 kB
> VmallocChunk:          0 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:  19177472 kB
> ShmemHugePages:        0 kB
> ShmemPmdMapped:        0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:      951376 kB
> DirectMap2M:    87015424 kB
> DirectMap1G:    48234496 kB
> 
> # cat /proc/buddyinfo
> Node 0, zone      DMA      1      0      0      0      2      1      1
>     0      1      1      3
> Node 0, zone    DMA32    372    418    403    395    371    322    262
>   179    114      0      0
> Node 0, zone   Normal  89147  96397  76496  56407  41671  29289  18142
> 10278   4075      0      0
> Node 1, zone   Normal 113266      0      1      1      1      1      1
>     1      1      0      0

Node 1 seems quite fragmented. Again from last year I recall somebody (was it
you?) capturing a larger series of snapshots where we saw a Sreclaimable rise
due to some overnight 'find /' activity inflating dentry/inode caches which then
got slowly reclaimed, but memory remained fragmented until enough of slab was
reclaimed, and compaction couldn't help. drop_caches did help. Looks like this
might be the same case. Add in something that tries to get large-order
allocations on node 1 (e.g. with __GFP_THISNODE) and overreclaim will happen.

> But with high PSI / memory pressure values above 10-30.
> 
> Greets,
> Stefan
> Am 27.03.19 um 11:56 schrieb Stefan Priebe - Profihost AG:
>> Hello list,
>> 
>> i hope this is the right place to ask. If not i would be happy to point
>> me to something else.
>> 
>> I'm seeing the following behaviour on some of our hosts running a SLES
>> 15 kernel (kernel v4.12 as it's base) but i don't think it's related to
>> the kernel.
>> 
>> At some "random" interval - mostly 3-6 weeks of uptime. Suddenly mem
>> pressure rises and the linux cache (Cached: /proc/meminfo) drops from
>> 12G to 3G. After that io pressure rises most probably due to low cache.
>> But at the same time i've MemFree und MemAvailable at 19-22G.
>> 
>> Why does this happen? How can i debug this situation? I would expect
>> that the page / file cache never drops if there is so much free mem.
>> 
>> Thanks a lot for your help.
>> 
>> Greets,
>> Stefan
>> 
>> Not sure whether needed but these are the vm. kernel settings:
>> vm.admin_reserve_kbytes = 8192
>> vm.block_dump = 0
>> vm.compact_unevictable_allowed = 1
>> vm.dirty_background_bytes = 0
>> vm.dirty_background_ratio = 10
>> vm.dirty_bytes = 0
>> vm.dirty_expire_centisecs = 3000
>> vm.dirty_ratio = 20
>> vm.dirty_writeback_centisecs = 500
>> vm.dirtytime_expire_seconds = 43200
>> vm.drop_caches = 0
>> vm.extfrag_threshold = 500
>> vm.hugepages_treat_as_movable = 0
>> vm.hugetlb_shm_group = 0
>> vm.laptop_mode = 0
>> vm.legacy_va_layout = 0
>> vm.lowmem_reserve_ratio = 256   256     32      1
>> vm.max_map_count = 65530
>> vm.memory_failure_early_kill = 0
>> vm.memory_failure_recovery = 1
>> vm.min_free_kbytes = 393216
>> vm.min_slab_ratio = 5
>> vm.min_unmapped_ratio = 1
>> vm.mmap_min_addr = 65536
>> vm.mmap_rnd_bits = 28
>> vm.mmap_rnd_compat_bits = 8
>> vm.nr_hugepages = 0
>> vm.nr_hugepages_mempolicy = 0
>> vm.nr_overcommit_hugepages = 0
>> vm.nr_pdflush_threads = 0
>> vm.numa_zonelist_order = default
>> vm.oom_dump_tasks = 1
>> vm.oom_kill_allocating_task = 0
>> vm.overcommit_kbytes = 0
>> vm.overcommit_memory = 0
>> vm.overcommit_ratio = 50
>> vm.page-cluster = 3
>> vm.panic_on_oom = 0
>> vm.percpu_pagelist_fraction = 0
>> vm.stat_interval = 1
>> vm.swappiness = 50
>> vm.user_reserve_kbytes = 131072
>> vm.vfs_cache_pressure = 100
>> vm.watermark_scale_factor = 10
>> vm.zone_reclaim_mode = 0
>> 
>