Re: [LSF/MM TOPIC] Better handling of negative dentries

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Tue, 22 Mar 2022 15:19:15 -0400

On Tue, 2022-03-22 at 15:08 +0000, Matthew Wilcox wrote:
> On Wed, Mar 16, 2022 at 01:52:23PM +1100, Dave Chinner wrote:
> > On Wed, Mar 16, 2022 at 10:07:19AM +0800, Gao Xiang wrote:
> > > On Tue, Mar 15, 2022 at 01:56:18PM -0700, Roman Gushchin wrote:
> > > > > On Mar 15, 2022, at 12:56 PM, Matthew Wilcox <
> > > > > willy@xxxxxxxxxxxxx> wrote:
> > > > > 
> > > > > The number of negative dentries is effectively constrained
> > > > > only by memory
> > > > > size.  Systems which do not experience significant memory
> > > > > pressure for
> > > > > an extended period can build up millions of negative dentries
> > > > > which
> > > > > clog the dcache.  That can have different symptoms, such as
> > > > > inotify
> > > > > taking a long time [1], high memory usage [2] and even just
> > > > > poor lookup
> > > > > performance [3].  We've also seen problems with cgroups being
> > > > > pinned
> > > > > by negative dentries, though I think we now reparent those
> > > > > dentries to
> > > > > their parent cgroup instead.
> > > > 
> > > > Yes, it should be fixed already.
> > > > 
> > > > > We don't have a really good solution yet, and maybe some
> > > > > focused
> > > > > brainstorming on the problem would lead to something that
> > > > > actually works.
> > > > 
> > > > I’d be happy to join this discussion. And in my opinion it’s
> > > > going beyond negative dentries: there are other types of
> > > > objects which tend to grow beyond any reasonable limits if
> > > > there is no memory pressure.
> > > 
> > > +1, we once had a similar issue as well, and agree that is not
> > > only
> > > limited to negative dentries but all too many LRU-ed dentries and
> > > inodes.
> > 
> > Yup, any discussion solely about managing buildup of negative
> > dentries doesn't acknowledge that it is just a symptom of larger
> > problems that need to be addressed.
> 
> Yes, but let's not make this _so_ broad a discussion that it becomes
> unsolvable.  Rather, let's look for a solution to this particular
> problem that can be adopted by other caches that share a similar
> problem.

Well, firstly what is the exact problem?  People maliciously looking up
nonexistent files and causing the negative dentries to balloon or
simply managing dentries for a well behaved workload (which is what
good examples?)

> For example, we might be seduced into saying "this is a slab problem"
> because all the instances we have here allocate from slab.  But slab
> doesn't have enough information to solve the problem.  Maybe the
> working set of the current workload really needs 6 million dentries
> to perform optimally.  Maybe it needs 600.  Slab can't know
> that.  Maybe slab can play a role here, but the only component which
> can know the appropriate size for a cache is the cache itself.

Could we do something depending on the age of the oldest dentry on the
lru list?  We don't currently keep a last accessed time, but we could.

> I think the logic needs to be in d_alloc().  Before it calls
> __d_alloc(), it should check ... something ... to see if it should
> try to shrink the LRU list.

We could also do it in kill_dentry() or retain_dentry() ... we're past
all the fast paths when they start running.  They also have a view on
the liveness of the dentry.  Chances are if you get a negative dentry
back in kill_dentry, it isn't going to be instantiated, so it's only
use is to cache -ENOENT and we could perform more agressive shrinking
heuristics.

>   The devil is in what that something
> should be.  I'm no expert on the dcache; do we just want to call
> prune_dcache_sb() for every 1/1000 time?  Rely on DCACHE_REFERENCED
> to make sure that we're not over-pruning the list?  If so, what do we
> set nr_to_scan to?  1000 so that we try to keep the dentry list the
> same size?  1500 so that it actually tries to shrink?

Fundamentally, a negative dentry that doesn't get promoted to a
positive one better be looked up pretty often for us to keep it in the
cache, I think?  So last accessed time seems to be a good indicator.

We also have to scan the whole list for negative dentries, perhaps they
might get on better with their own lru list (although that complicates
the instantiation case and perhaps we can afford to spend the time
walking backwards over the lru list in alloc/kill anyway, so there's no
need for a separate list).

> I don't feel like I know enough to go further here.  But it feels
> better than what we currently do -- calling all the shrinkers from
> deep in the page allocator.

Well, yes, but that's not a high bar.

James