On Sat, 11 May 2024 at 12:28, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Sat, May 11, 2024 at 11:42:34AM -0700, Linus Torvalds wrote: > > > > And that outside lock is the much more important one, I bet. > > ... and _that_ is where taking d_delete outside of the lock might > take an unpleasant analysis of a lot of code. Hmm. It really shouldn't matter. There can only be negative children of the now deleted directory, so there are no actual effects on inodes. It only affects the d_child list, which is protected by d_lock (and can be modified outside of the inode lock anyway due to memory pressure). What am I missing? > In any case, I think the original poster said that parent directories > were not removed, so I doubt that rmdir() behaviour is relevant for > their load. I don't see that at all. The load was a "rm -rf" of a directory tree, and all of that was successful as far as I can see from the report. The issue was that an unrelated process just looking at the directory (either one - I clearly tested the wrong one) would be held up by the directory lock while the pruning was going on. And yes, the pruning can take a couple of seconds with "just" a few million negative dentries. The negative dentries obviously don't even have to be the result of a 'delete' - the easy way to see this is to do a lot of bogus lookups. Attached is my excessively stupid test-case in case you want to play around with it: [dirtest]$ time ./a.out dir ; time rmdir dir real 0m12.592s user 0m1.153s sys 0m11.412s real 0m1.892s user 0m0.001s sys 0m1.528s so you can see how it takes almost two seconds to then flush those negative dentries - even when there were no 'unlink()' calls at all, just failed lookups. It's maybe instructive to do the same on tmpfs, which has /* * Retaining negative dentries for an in-memory filesystem just wastes * memory and lookup time: arrange for them to be deleted immediately. */ int always_delete_dentry(const struct dentry *dentry) { return 1; } and so if you do the same test on /tmp, the results are very different: [dirtest]$ time ./a.out /tmp/sillydir ; time rmdir /tmp/sillydir real 0m8.129s user 0m1.164s sys 0m6.592s real 0m0.001s user 0m0.000s sys 0m0.001s so it does show very different patterns and you can test the whole "what happens without negative dentries" case. Linus
#include <unistd.h> #include <fcntl.h> #include <sys/stat.h> #include <string.h> #include <stdlib.h> #include <stdio.h> #include <errno.h> #define FATAL(x) do { if (x) die(#x); } while (0) static void die(const char *s) { fprintf(stderr, "%s: %s\n", s, strerror(errno)); exit(1); } int main(int argc, char ** argv) { char *dirname = argv[1]; FATAL(argc < 2); FATAL(mkdir(dirname, 0700)); for (int i = 0; i < 10000000; i++) { int fd; char name[128]; snprintf(name, sizeof(name), "%s/name-%09d", dirname, i); FATAL(open(name, O_RDONLY) >= 0); } return 0; }