Re: Caching/buffers become useless after some time

Marinko Catovic <marinko.catovic@xxxxxxxxx> · Mon, 6 Aug 2018 12:29:43 +0200

> 
Maybe a memcg with kmemcg limit? Michal could know more.

Could you/Michael explain this perhaps?

The hardware is pretty much high end datacenter grade, I really would
not know how this is to be related with the hardware :(

I do not understand why apparently the caching is working very much
fine for the beginning after a drop_caches, then degrades to low usage
somewhat later. I can not possibly drop caches automatically, since
this requires monitoring for overload with temporary dropping traffic
on specific ports until the writes/reads cool down.

2018-08-06 11:40 GMT+02:00 Vlastimil Babka <vbabka@xxxxxxx>:
On 08/03/2018 04:13 PM, Marinko Catovic wrote:

> Thanks for the analysis.

> 

> So since I am no mem management dev, what exactly does this mean?

> Is there any way of workaround or quickfix or something that can/will

> be fixed at some point in time?

Workaround would be the manual / periodic cache flushing, unfortunately.

Maybe a memcg with kmemcg limit? Michal could know more.

A long-term generic solution will be much harder to find :(

> I can not imagine that I am the only one who is affected by this, nor do I

> know why my use case would be so much different from any other.

> Most 'cloud' services should be affected as well.

Hmm, either your workload is specific in being hungry for fs metadata

and not much data (page cache). And/Or there's some source of the

high-order allocations that others don't have, possibly related to some

piece of hardware?

> Tell me if you need any other snapshots or whatever info.

> 

> 2018-08-02 18:15 GMT+02:00 Vlastimil Babka <vbabka@xxxxxxx

> <mailto:vbabka@xxxxxxx>>:

> 

>     On 07/31/2018 12:08 AM, Marinko Catovic wrote:

>     > 

>     >> Can you provide (a single snapshot) /proc/pagetypeinfo and

>     >> /proc/slabinfo from a system that's currently experiencing the issue,

>     >> also with /proc/vmstat and /proc/zoneinfo to verify? Thanks.

>     > 

>     > your request came in just one day after I 2>drop_caches again when the

>     > ram usage

>     > was really really low again. Up until now it did not reoccur on any of

>     > the 2 hosts,

>     > where one shows 550MB/11G with 37G of totally free ram for now - so not

>     > that low

>     > like last time when I dropped it, I think it was like 300M/8G or so, but

>     > I hope it helps:

> 

>     Thanks.

> 

>     > /proc/pagetypeinfo  https://pastebin.com/6QWEZagL

> 

>     Yep, looks like fragmented by reclaimable slabs:

> 

>     Node    0, zone   Normal, type    Unmovable  29101  32754   8372 

>      2790   1334    354     23      3      4      0      0

>     Node    0, zone   Normal, type      Movable 142449  83386  99426 

>     69177  36761  12931   1378     24      0      0      0

>     Node    0, zone   Normal, type  Reclaimable 467195 530638 355045

>     192638  80358  15627   2029    231     18      0      0

> 

>     Number of blocks type     Unmovable      Movable  Reclaimable 

>      HighAtomic      Isolate

>     Node 0, zone      DMA            1            7            0       

>         0            0

>     Node 0, zone    DMA32           34          703          375       

>         0            0

>     Node 0, zone   Normal         1672        14276        15659       

>         1            0

> 

>     Half of the memory is marked as reclaimable (2 megabyte) pageblocks.

>     zoneinfo has nr_slab_reclaimable 1679817 so the reclaimable slabs occupy

>     only 3280 (6G) pageblocks, yet they are spread over 5 times as much.

>     It's also possible they pollute the Movable pageblocks as well, but the

>     stats can't tell us. Either the page grouping mobility heuristics are

>     broken here, or the worst case scenario happened - memory was at

>     some point

>     really wholly filled with reclaimable slabs, and the rather random

>     reclaim

>     did not result in whole pageblocks being freed.

> 

>     > /proc/slabinfo  https://pastebin.com/81QAFgke

> 

>     Largest caches seem to be:

>     # name            <active_objs> <num_objs> <objsize> <objperslab>

>     <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> :

>     slabdata <active_slabs> <num_slabs> <sharedavail>

>     ext4_inode_cache  3107754 3759573   1080    3    1 : tunables   24 

>      12    8 : slabdata 1253191 1253191      0

>     dentry            2840237 7328181    192   21    1 : tunables  120 

>      60    8 : slabdata 348961 348961    120

> 

>     The internal framentation of dentry cache is significant as well.

>     Dunno if some of those objects pin movable pages as well...

> 

>     So looks like there's insufficient slab reclaim (shrinker activity), and

>     possibly problems with page grouping by mobility heuristics as well...

> 

>     > /proc/vmstat  https://pastebin.com/S7mrQx1s

>     > /proc/zoneinfo  https://pastebin.com/csGeqNyX

>     >

>     > also please note - whether this makes any difference: there is no swap

>     > file/partition

>     > I am using this without swap space. imho this should not be

>     necessary since

>     > applications running on the hosts would not consume more than

>     20GB, the rest

>     > should be used by buffers/cache.

>     >

> 

>