Re: [LSF/MM TOPIC] Better handling of negative dentries

Stephen Brennan <stephen.s.brennan@xxxxxxxxxx> · Tue, 22 Mar 2022 14:08:04 -0700

On 3/22/22 13:37, Matthew Wilcox wrote:
> On Tue, Mar 22, 2022 at 04:17:16PM -0400, Colin Walters wrote:
>>
>>
>> On Tue, Mar 22, 2022, at 3:19 PM, James Bottomley wrote:
>>>
>>> Well, firstly what is the exact problem?  People maliciously looking up
>>> nonexistent files
>>
>> Maybe most people have seen it, but for those who haven't:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1571183
>> was definitely one of those things that just makes one recoil in horror.
>>
>> TL;DR NSS used to have code that tried to detect "is this a network filesystem"
>> by timing `stat()` calls to nonexistent paths, and this massively boated
>> the negative dentry cache and caused all sorts of performance problems.
>> It was particularly confusing because this would just happen as a side effect of e.g. executing `curl https://somewebsite`.
>>
>> That code wasn't *intentionally* malicious but...

That's... unpleasant.

> 
> Oh, the situation where we encountered the problem was systemd.
> Definitely not malicious, and not even stupid (as the NSS example above).
> I forget exactly which thing it was, but on some fairly common event
> (user login?), it looked up a file in a PATH of some type, failed
> to find it in the first two directories, then created it in a third> At logout, it deleted the file.  Now there are three negative dentries.

More or less this, although I'm not sure it even created and deleted the
files... it just wanted to check for them in all sorts of places. The
file paths were something like this:

/{etc,usr/lib}/systemd/system/session-XXXXXXXX.scope.{wants,d,requires}

> Repeat a few million times (each time looking for a different file)
> with no memory pressure and you have a thoroughly soggy machine that
> is faster to reboot than to reclaim dentries.

The speed of reclaiming memory wasn't the straw which broke this
server's back, it ended up being that some operations might iterate over
the entire list of children of a dentry, holding a spinlock, causing
soft lockups. Thus, patches like [1] which are attempting to treat the
symptom, not the cause.

It seems to me that the idea of doing something based on last access
time, or number of accesses, would be a great step. But given a
prioritized list of dentries to target, and even a reasonable call site
like kill_dentry(), the hardest part still seems to be determining the
working set of dentries, or at least determining what is a reasonable
number of negative dentries to keep around.

If we're looking at issues like [1], then the amount needs to be on a
per-directory basis, and maybe roughly based on CPU speed. For other
priorities or failure modes, then the policy would need to be completely
different. Ideally a solution could work for almost all scenarios, but
failing that, maybe it is worth allowing policy to be set by
administrators via sysctl or even a BPF?

Thanks,
Stephen

[1]:
https://lore.kernel.org/linux-fsdevel/20220209231406.187668-1-stephen.s.brennan@xxxxxxxxxx/