On Sat, 2023-02-04 at 15:44 -0500, Benjamin Coddington wrote: > On 4 Feb 2023, at 11:52, Trond Myklebust wrote: > > On Feb 4, 2023, at 08:15, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > > > Ah, thanks for explaining that. > > > > > > I'd like to summarize and quantify this problem one last time for folks that > > > don't want to read everything. If an application wants to remove all files > > > and the parent directory, and uses this pattern to do it: > > > > > > opendir > > > while (getdents) > > > unlink dents > > > closedir > > > rmdir > > > > > > Before this commit, that would work with up to 126 dentries on NFS from > > > tmpfs export. If the directory had 127 or more, the rmdir would fail with > > > ENOTEMPTY. > > > > For all sizes of filenames, or just the particular set that was chosen > > here? What about the choice of rsize? Both these values affect how many > > entries glibc can cache before it has to issue another getdents() call > > into the kernel. For the record, this is what glibc does in the opendir() > > code in order to choose a buffer size for the getdents syscalls: > > > > /* The st_blksize value of the directory is used as a hint for the > > size of the buffer which receives struct dirent values from the > > kernel. st_blksize is limited to max_buffer_size, in case the > > file system provides a bogus value. */ > > enum { max_buffer_size = 1048576 }; > > > > enum { allocation_size = 32768 }; > > _Static_assert (allocation_size >= sizeof (struct dirent64), > > "allocation_size < sizeof (struct dirent64)"); > > > > /* Increase allocation if requested, but not if the value appears to > > be bogus. It will be between 32Kb and 1Mb. */ > > size_t allocation = MIN (MAX ((size_t) statp->st_blksize, (size_t) > > allocation_size), (size_t) max_buffer_size); > > > > DIR *dirp = (DIR *) malloc (sizeof (DIR) + allocation); > > The behavioral complexity is even higher with glibc in the mix, but both the > test that Chuck's using and the reproducer I've been making claims about > use SYS_getdents directly. I'm using a static 4k buffer size which is big > enough to fit enough entries to prime the heuristic for a single call to > getdents() whether or not we return early at 17 or 126. > > > > After this commit, it only works with up to 17 dentries. > > > > > > The argument that this is making things worse takes the position that there > > > are more directories in the universe with >17 dentries that want to be > > > cleaned up by this "saw off the branch you're sitting on" pattern than > > > directories with >127. And I guess that's true if Chuck runs that testing > > > setup enough. :) > > > > > > We can change the optimization in the commit from > > > NFS_READDIR_CACHE_MISS_THRESHOLD + 1 > > > to > > > nfs_readdir_array_maxentries + 1 > > > > > > This would make the regression disappear, and would also keep most of the > > > optimization. > > > > > > Ben > > > > > > > So in other words the suggestion is to optimise the number of readdir > > records that we return from NFS to whatever value that papers over the > > known telldir()/seekdir() tmpfs bug that is re-revealed by this particular > > test when run under these particular conditions? > > Yes. It's a terrible suggestion. Its only merit may be that it meets the > letter of the no regressions law. I hate it, and I after I started popping > out patches that do it I've found they've all made the behavior far more > complex due to the way we dynamically optimize dtsize. > > > Anyone who tries to use tmpfs with a different number of files, different > > file name lengths, or different mount options is still SOL because that’s > > not a “regression"? > > Right. :P > > Ben > I may be missing something, but would it be possible to move to a more stable scheme for readdir cookies for tmpfs? It is tmpfs, so we don't need to worry about persisting these values across reboots. Could we (e.g.) hash dentry pointers to generate cookies? -- Jeff Layton <jlayton@xxxxxxxxxx>