On 4 Feb 2023, at 11:52, Trond Myklebust wrote: > On Feb 4, 2023, at 08:15, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: >> Ah, thanks for explaining that. >> >> I'd like to summarize and quantify this problem one last time for folks that >> don't want to read everything. If an application wants to remove all files >> and the parent directory, and uses this pattern to do it: >> >> opendir >> while (getdents) >> unlink dents >> closedir >> rmdir >> >> Before this commit, that would work with up to 126 dentries on NFS from >> tmpfs export. If the directory had 127 or more, the rmdir would fail with >> ENOTEMPTY. > > For all sizes of filenames, or just the particular set that was chosen > here? What about the choice of rsize? Both these values affect how many > entries glibc can cache before it has to issue another getdents() call > into the kernel. For the record, this is what glibc does in the opendir() > code in order to choose a buffer size for the getdents syscalls: > > /* The st_blksize value of the directory is used as a hint for the > size of the buffer which receives struct dirent values from the > kernel. st_blksize is limited to max_buffer_size, in case the > file system provides a bogus value. */ > enum { max_buffer_size = 1048576 }; > > enum { allocation_size = 32768 }; > _Static_assert (allocation_size >= sizeof (struct dirent64), > "allocation_size < sizeof (struct dirent64)"); > > /* Increase allocation if requested, but not if the value appears to > be bogus. It will be between 32Kb and 1Mb. */ > size_t allocation = MIN (MAX ((size_t) statp->st_blksize, (size_t) > allocation_size), (size_t) max_buffer_size); > > DIR *dirp = (DIR *) malloc (sizeof (DIR) + allocation); The behavioral complexity is even higher with glibc in the mix, but both the test that Chuck's using and the reproducer I've been making claims about use SYS_getdents directly. I'm using a static 4k buffer size which is big enough to fit enough entries to prime the heuristic for a single call to getdents() whether or not we return early at 17 or 126. >> After this commit, it only works with up to 17 dentries. >> >> The argument that this is making things worse takes the position that there >> are more directories in the universe with >17 dentries that want to be >> cleaned up by this "saw off the branch you're sitting on" pattern than >> directories with >127. And I guess that's true if Chuck runs that testing >> setup enough. :) >> >> We can change the optimization in the commit from >> NFS_READDIR_CACHE_MISS_THRESHOLD + 1 >> to >> nfs_readdir_array_maxentries + 1 >> >> This would make the regression disappear, and would also keep most of the >> optimization. >> >> Ben >> > > So in other words the suggestion is to optimise the number of readdir > records that we return from NFS to whatever value that papers over the > known telldir()/seekdir() tmpfs bug that is re-revealed by this particular > test when run under these particular conditions? Yes. It's a terrible suggestion. Its only merit may be that it meets the letter of the no regressions law. I hate it, and I after I started popping out patches that do it I've found they've all made the behavior far more complex due to the way we dynamically optimize dtsize. > Anyone who tries to use tmpfs with a different number of files, different > file name lengths, or different mount options is still SOL because that’s > not a “regression"? Right. :P Ben