On Wed, Mar 16, 2022 at 10:07:19AM +0800, Gao Xiang wrote: > On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote: > > > > > On Mar 15, 2022, at 12:56 PM, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > > > The number of negative dentries is effectively constrained only by memory > > > size. Systems which do not experience significant memory pressure for > > > an extended period can build up millions of negative dentries which > > > clog the dcache. That can have different symptoms, such as inotify > > > taking a long time [1], high memory usage [2] and even just poor lookup > > > performance [3]. We've also seen problems with cgroups being pinned > > > by negative dentries, though I think we now reparent those dentries to > > > their parent cgroup instead. > > > > Yes, it should be fixed already. > > > > > > > > We don't have a really good solution yet, and maybe some focused > > > brainstorming on the problem would lead to something that actually works. > > > > I’d be happy to join this discussion. And in my opinion it’s going beyond negative dentries: there are other types of objects which tend to grow beyond any reasonable limits if there is no memory pressure. > > +1, we once had a similar issue as well, and agree that is not only > limited to negative dentries but all too many LRU-ed dentries and inodes. Yup, any discussion solely about managing buildup of negative dentries doesn't acknowledge that it is just a symptom of larger problems that need to be addressed. > Limited the total number may benefit to avoid shrink spiking for servers. No, we don't want to set hard limits on object counts - that's just asking for systems that need frequent hand tuning and are impossible to get right under changing workloads. Caches need to auto size according to workload's working set to find a steady state balance, not be bound by artitrary limits. But even cache sizing isn't the problem here - it's just another symptom. > > A perfect example when it happens is when a machine is almost > > idle for some period of time. Periodically running processes > > creating various kernel objects (mostly vfs cache) which over > > time are filling significant portions of the total memory. And > > when the need for memory arises, we realize that the memory is > > heavily fragmented and it’s costly to reclaim it back. Yup, the underlying issue here is that memory reclaim does nothing to manage long term build-up of single use cached objects when *there is no memory pressure*. There's of idle time and spare resources to manage caches sanely, but we don't. e.g. there is no periodic rotation of caches that could lead to detection and reclaim of single use objects (say over a period of minutes) and hence prevent them from filling up all of memory unnecessarily and creating transient memory reclaim and allocation latency spikes when memory finally fills up. IOWs, negative dentries getting out of hand and shrinker spikes are both a symptom of the same problem: while memory allocation is free, memory reclaim does nothing to manage cache aging. Hence we only find out we've got a badly aged cache when we finally realise it has filled all of memory, and then we have heaps of work to do before memory can be made available for allocation again.... And then if you're going to talk memory reclaim, the elephant in the room is the lack of integration between shrinkers and the main reclaim infrastructure. There's no priority determination, there's no progress feedback, there's no mechanism to allow shrinkers to throttle reclaim rather than have the reclaim infrastructure wind up priority and OOM kill when a shrinker cannot make progress quickly, etc. Then there's direct reclaim hammering shrinkers with unbound concurrency so individual shrinkers have no chance of determining how much memory pressure there really is by themselves, not to mention the lock contention problems that unbound reclaim concurrency on things like LRU lists can cause. And, of course, memcg based reclaim is still only tacked onto the side of the shrinker infrastructure... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx