On 3/22/22 13:37, Matthew Wilcox wrote: > On Tue, Mar 22, 2022 at 04:17:16PM -0400, Colin Walters wrote: >> >> >> On Tue, Mar 22, 2022, at 3:19 PM, James Bottomley wrote: >>> >>> Well, firstly what is the exact problem? People maliciously looking up >>> nonexistent files >> >> Maybe most people have seen it, but for those who haven't: >> https://bugzilla.redhat.com/show_bug.cgi?id=1571183 >> was definitely one of those things that just makes one recoil in horror. >> >> TL;DR NSS used to have code that tried to detect "is this a network filesystem" >> by timing `stat()` calls to nonexistent paths, and this massively boated >> the negative dentry cache and caused all sorts of performance problems. >> It was particularly confusing because this would just happen as a side effect of e.g. executing `curl https://somewebsite`. >> >> That code wasn't *intentionally* malicious but... That's... unpleasant. > > Oh, the situation where we encountered the problem was systemd. > Definitely not malicious, and not even stupid (as the NSS example above). > I forget exactly which thing it was, but on some fairly common event > (user login?), it looked up a file in a PATH of some type, failed > to find it in the first two directories, then created it in a third> At logout, it deleted the file. Now there are three negative dentries. More or less this, although I'm not sure it even created and deleted the files... it just wanted to check for them in all sorts of places. The file paths were something like this: /{etc,usr/lib}/systemd/system/session-XXXXXXXX.scope.{wants,d,requires} > Repeat a few million times (each time looking for a different file) > with no memory pressure and you have a thoroughly soggy machine that > is faster to reboot than to reclaim dentries. The speed of reclaiming memory wasn't the straw which broke this server's back, it ended up being that some operations might iterate over the entire list of children of a dentry, holding a spinlock, causing soft lockups. Thus, patches like [1] which are attempting to treat the symptom, not the cause. It seems to me that the idea of doing something based on last access time, or number of accesses, would be a great step. But given a prioritized list of dentries to target, and even a reasonable call site like kill_dentry(), the hardest part still seems to be determining the working set of dentries, or at least determining what is a reasonable number of negative dentries to keep around. If we're looking at issues like [1], then the amount needs to be on a per-directory basis, and maybe roughly based on CPU speed. For other priorities or failure modes, then the policy would need to be completely different. Ideally a solution could work for almost all scenarios, but failing that, maybe it is worth allowing policy to be set by administrators via sysctl or even a BPF? Thanks, Stephen [1]: https://lore.kernel.org/linux-fsdevel/20220209231406.187668-1-stephen.s.brennan@xxxxxxxxxx/