> I went through the whole thread again as it was spread over months, and > finally connected some dots. In one mail you said: > > > There is one thing I forgot to mention: the hosts perform find and du (I mean the commands, finding files and disk usage) > > on the HDDs every night, starting from 00:20 AM up until in the morning 07:45 AM, for maintenance and stats. > > The timespan above roughly matches the phase where reclaimable slab grow > (samples 2000-6000 over 5 seconds is roughly 5.5 hours). The find will > fetch a lots of metadata in dentries, inodes etc. which are part of > reclaimable slabs. In other mail you posted a slabinfo > https://pastebin.com/81QAFgke in the phase where it's already being > slowly reclaimed, but still occupies 6.5GB, and mostly it's > ext4_inode_cache, and dentry cache (also very much internally fragmented). > In another mail I suggest that maybe fragmentation happened because the > slab filled up much more at some point, and I think we now have that > solidly confirmed from the vmstat plots. > I think one workaround is for you to perform echo 2 > drop_caches (not > 3) right after the find/du maintenance finishes. At that point you don't > have too much page cache anyway, since the slabs have pushed it out. > It's also overnight so there are not many users yet? > Alternatively the find/du could run in a memcg limiting its slab use. > Michal would know the details. > > Long term we should do something about these slab objects that are only > used briefly (once?) so there's no point in caching them and letting the > cache grow like this. > Well caching of any operations with find/du is not necessary imho anyway, since walking over all these millions of files in that time period is really not worth caching at all - if there is a way you mentioned to limit the commands there, that would be great. Also I want to mention that these operations were in use with 3.x kernels as well, for years, with absolutely zero issues. 2 > drop_caches right after that is something I considered, I just had some bad experience with this, since I tried it around 5:00 AM in the first place to give it enough spare time to finish, since sync; echo 2 > drop_caches can take some time, hence my question about lowering the limits in mm/vmscan.c, void drop_slab_node(int nid) I could do this effectively right after find/du at 07:45, just hoping that this is finished soon enough - in one worst case it took over 2 hours (from 05:00 AM to 07:00 AM), since the host was busy during that time with find/du, never having freed enough caches to continue, hence my question to let it stop earlier with the modification of drop_slab_node ... it was just an idea, nevermind if you believe that it was a bad one :)