On 07/31/2018 12:08 AM, Marinko Catovic wrote: > >> Can you provide (a single snapshot) /proc/pagetypeinfo and >> /proc/slabinfo from a system that's currently experiencing the issue, >> also with /proc/vmstat and /proc/zoneinfo to verify? Thanks. > > your request came in just one day after I 2>drop_caches again when the > ram usage > was really really low again. Up until now it did not reoccur on any of > the 2 hosts, > where one shows 550MB/11G with 37G of totally free ram for now - so not > that low > like last time when I dropped it, I think it was like 300M/8G or so, but > I hope it helps: Thanks. > /proc/pagetypeinfo https://pastebin.com/6QWEZagL Yep, looks like fragmented by reclaimable slabs: Node 0, zone Normal, type Unmovable 29101 32754 8372 2790 1334 354 23 3 4 0 0 Node 0, zone Normal, type Movable 142449 83386 99426 69177 36761 12931 1378 24 0 0 0 Node 0, zone Normal, type Reclaimable 467195 530638 355045 192638 80358 15627 2029 231 18 0 0 Number of blocks type Unmovable Movable Reclaimable HighAtomic Isolate Node 0, zone DMA 1 7 0 0 0 Node 0, zone DMA32 34 703 375 0 0 Node 0, zone Normal 1672 14276 15659 1 0 Half of the memory is marked as reclaimable (2 megabyte) pageblocks. zoneinfo has nr_slab_reclaimable 1679817 so the reclaimable slabs occupy only 3280 (6G) pageblocks, yet they are spread over 5 times as much. It's also possible they pollute the Movable pageblocks as well, but the stats can't tell us. Either the page grouping mobility heuristics are broken here, or the worst case scenario happened - memory was at some point really wholly filled with reclaimable slabs, and the rather random reclaim did not result in whole pageblocks being freed. > /proc/slabinfo https://pastebin.com/81QAFgke Largest caches seem to be: # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> ext4_inode_cache 3107754 3759573 1080 3 1 : tunables 24 12 8 : slabdata 1253191 1253191 0 dentry 2840237 7328181 192 21 1 : tunables 120 60 8 : slabdata 348961 348961 120 The internal framentation of dentry cache is significant as well. Dunno if some of those objects pin movable pages as well... So looks like there's insufficient slab reclaim (shrinker activity), and possibly problems with page grouping by mobility heuristics as well... > /proc/vmstat https://pastebin.com/S7mrQx1s > /proc/zoneinfo https://pastebin.com/csGeqNyX > > also please note - whether this makes any difference: there is no swap > file/partition > I am using this without swap space. imho this should not be necessary since > applications running on the hosts would not consume more than 20GB, the rest > should be used by buffers/cache. >