On Mon, 16 Jul 2018 05:41:15 -0700 Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > On Mon, Jul 16, 2018 at 11:09:01AM +0200, Michal Hocko wrote: > > On Fri 13-07-18 10:36:14, Dave Chinner wrote: > > [...] > > > By limiting the number of negative dentries in this case, internal > > > slab fragmentation is reduced such that reclaim cost never gets out > > > of control. While it appears to "fix" the symptoms, it doesn't > > > address the underlying problem. It is a partial solution at best but > > > at worst it's another opaque knob that nobody knows how or when to > > > tune. > > > > Would it help to put all the negative dentries into its own slab cache? > > Maybe the dcache should be more sensitive to its own needs. In __d_alloc, > it could check whether there are a high proportion of negative dentries > and start recycling some existing negative dentries. Well, yes. The proposed patchset adds all this background reclaiming. Problem is a) that background reclaiming sometimes can't keep up so a synchronous direct-reclaim was added on top and b) reclaiming dentries in the background will cause non-dentry-allocating tasks to suffer because of activity from the dentry-allocating tasks, which is inappropriate. I expect a better design is something like __d_alloc() { ... while (too many dentries) call the dcache shrinker ... } and that's it. This way we have a hard upper limit and only the tasks which are creating dentries suffer the cost. Regarding the slab page fragmentation issue: I'm wondering if the whole idea of balancing the slab scan rates against the page scan rates isn't really working out. Maybe shrink_slab() should be sitting there hammering the caches until they have freed up a particular number of pages. Quite a big change, conceptually and implementationally. Aside: about a billion years ago we were having issues with processes getting stuck in direct reclaim because other processes were coming in and stealing away the pages which the direct-reclaimer had just freed. One possible solution to that was to make direct-reclaiming tasks release the freed pages into a list on the task_struct. So those pages were invisible to other allocating tasks and were available to the direct-reclaimer when it returned from the reclaim effort. I forget what happened to this. It's quite a small code change and would provide a mechanism for implementing the hammer-cache-until-youve-freed-enough design above. Aside 2: if we *do* do something like the above __d_alloc() pseudo code then perhaps it could be cast in terms of pages, not dentries. ie, __d_alloc() { ... while (too many pages in dentry_cache) call the dcache shrinker ... } and, apart from the external name thing (grr), that should address these fragmentation issues, no? I assume it's easy to ask slab how many pages are presently in use for a particular cache.