On Fri, Aug 25, 2017 at 8:10 AM, Colin Walters <walters@xxxxxxxxxx> wrote: > On Fri, Aug 25, 2017, at 10:47 AM, Jan Kara wrote: >> >> It is possible that some of these dentries are so rarely used that they are >> indeed just a waste > > In some cases - think containers, or ostree-style root filesystem snapshots, > if we do an `rm -rf /path/to/container-root`, userspace knows for a fact > that nothing will reference those paths again - all of the processes that could > have been killed. There's no point to having negative dentries for them. > > Maybe something like unlinkat (dfd, path, AT_UNLINKAT_DONTNEED), like > madvise (MAV_DONTNEED) ? No, I think the right fix is to just prune the child dentries when doing an rmdir - and I thought we did that already. IOW, when doing "rmdir()" on a dentry, we should not prune *that* dentry, but we should prune all the dentries (recursively) under it. .. goers to look .. Yeah, look at vfs_rmdir(): it does that shrink_dcache_parent(dentry); already. So when you do a "rm -rf something", you should already have *no* extra negative dentries - there should be exactly one negative dentry remaining (the dentry that used to be the directory itself). So I really don't see what the problem is. If you have lots of dentries left over after a "rm -rf /path/to/container-root", something else is wrong. The only time we have negative dentries is: (a) lookups that didn't match. These are CRITICALLY IMPORTANT. They are very very common. Why? Think of all the PATH-like behavior that Unix traditionally has: not just PATH itself, but look at what processes do with 'strace'. They end up searching for things like translation files for error messages, for shared libraries, for a number of things using path-like things, and it's actually really important that they do *not* keep trying to call down to the filesystem to look for a path that doesn't exist. That's often the most expensive filesystem operation there is - because in a big directory (and again - think about PATH - it often traverses some of the biggest directories around), many filesystems end up walking the whole directory before they say "oops, I didn't find that file". (b) individually removed files. These aren't as important, and we could possibly shrink things, but they really shouldn't matter. How often do you remove a ton of files without removing the directory they are in? Not often. (c) renames etc. These tend to be even less of an issue. So I really think that you shouldn't have that many negative dentries to begin with under normal load, but even if you do, they should be really easy to prune like Jan says. So send out a real load with real numbers. None of this touchy-feely thing that seems to be wrong. Ok? Because maybe we have a bug, and that shrink_dcache_parent() thing doesn't work. That would be interesting and relevant and a bug, so definitely worth fixing. (Side note: the shrink_dcache_parent() thing is actually done before the filesystem rmdir() is called, which means that you can use a "rmdir()" as a MAV_DONTNEED on the dentry tree below that directory. Just make sure the directory isn't empty, so that it doesn't actually get deleted) Linus