>
Maybe a memcg with kmemcg limit? Michal could know more.
Could you/Michael explain this perhaps?
The hardware is pretty much high end datacenter grade, I really would
not know how this is to be related with the hardware :(
I do not understand why apparently the caching is working very much
fine for the beginning after a drop_caches, then degrades to low usage
somewhat later. I can not possibly drop caches automatically, since
this requires monitoring for overload with temporary dropping traffic
on specific ports until the writes/reads cool down.
2018-08-06 11:40 GMT+02:00 Vlastimil Babka <vbabka@xxxxxxx>:
On 08/03/2018 04:13 PM, Marinko Catovic wrote:
> Thanks for the analysis.
>
> So since I am no mem management dev, what exactly does this mean?
> Is there any way of workaround or quickfix or something that can/will
> be fixed at some point in time?
Workaround would be the manual / periodic cache flushing, unfortunately.
Maybe a memcg with kmemcg limit? Michal could know more.
A long-term generic solution will be much harder to find :(
> I can not imagine that I am the only one who is affected by this, nor do I
> know why my use case would be so much different from any other.
> Most 'cloud' services should be affected as well.
Hmm, either your workload is specific in being hungry for fs metadata
and not much data (page cache). And/Or there's some source of the
high-order allocations that others don't have, possibly related to some
piece of hardware?
> Tell me if you need any other snapshots or whatever info.
>
> 2018-08-02 18:15 GMT+02:00 Vlastimil Babka <vbabka@xxxxxxx
> <mailto:vbabka@xxxxxxx>>:
>
> On 07/31/2018 12:08 AM, Marinko Catovic wrote:
> >
> >> Can you provide (a single snapshot) /proc/pagetypeinfo and
> >> /proc/slabinfo from a system that's currently experiencing the issue,
> >> also with /proc/vmstat and /proc/zoneinfo to verify? Thanks.
> >
> > your request came in just one day after I 2>drop_caches again when the
> > ram usage
> > was really really low again. Up until now it did not reoccur on any of
> > the 2 hosts,
> > where one shows 550MB/11G with 37G of totally free ram for now - so not
> > that low
> > like last time when I dropped it, I think it was like 300M/8G or so, but
> > I hope it helps:
>
> Thanks.
>
> > /proc/pagetypeinfo https://pastebin.com/6QWEZagL
>
> Yep, looks like fragmented by reclaimable slabs:
>
> Node 0, zone Normal, type Unmovable 29101 32754 8372
> 2790 1334 354 23 3 4 0 0
> Node 0, zone Normal, type Movable 142449 83386 99426
> 69177 36761 12931 1378 24 0 0 0
> Node 0, zone Normal, type Reclaimable 467195 530638 355045
> 192638 80358 15627 2029 231 18 0 0
>
> Number of blocks type Unmovable Movable Reclaimable
> HighAtomic Isolate
> Node 0, zone DMA 1 7 0
> 0 0
> Node 0, zone DMA32 34 703 375
> 0 0
> Node 0, zone Normal 1672 14276 15659
> 1 0
>
> Half of the memory is marked as reclaimable (2 megabyte) pageblocks.
> zoneinfo has nr_slab_reclaimable 1679817 so the reclaimable slabs occupy
> only 3280 (6G) pageblocks, yet they are spread over 5 times as much.
> It's also possible they pollute the Movable pageblocks as well, but the
> stats can't tell us. Either the page grouping mobility heuristics are
> broken here, or the worst case scenario happened - memory was at
> some point
> really wholly filled with reclaimable slabs, and the rather random
> reclaim
> did not result in whole pageblocks being freed.
>
> > /proc/slabinfo https://pastebin.com/81QAFgke
>
> Largest caches seem to be:
> # name <active_objs> <num_objs> <objsize> <objperslab>
> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> :
> slabdata <active_slabs> <num_slabs> <sharedavail>
> ext4_inode_cache 3107754 3759573 1080 3 1 : tunables 24
> 12 8 : slabdata 1253191 1253191 0
> dentry 2840237 7328181 192 21 1 : tunables 120
> 60 8 : slabdata 348961 348961 120
>
> The internal framentation of dentry cache is significant as well.
> Dunno if some of those objects pin movable pages as well...
>
> So looks like there's insufficient slab reclaim (shrinker activity), and
> possibly problems with page grouping by mobility heuristics as well...
>
> > /proc/vmstat https://pastebin.com/S7mrQx1s
> > /proc/zoneinfo https://pastebin.com/csGeqNyX
> >
> > also please note - whether this makes any difference: there is no swap
> > file/partition
> > I am using this without swap space. imho this should not be
> necessary since
> > applications running on the hosts would not consume more than
> 20GB, the rest
> > should be used by buffers/cache.
> >
>
>