On 3 Feb 2023, at 9:38, Chuck Lever III wrote: >> On Feb 1, 2023, at 10:53 AM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: >> >> On 1 Feb 2023, at 9:10, Benjamin Coddington wrote: >>> >>> Working on a fix.. >> >> .. actually, I have no idea how to fix this - if tmpfs is going to modify >> the position of its dentries, I can't think of a way for the client to loop >> through getdents() and remove every file reliably. >> >> The patch you bisected into just makes this happen on directories with 18 >> entries instead of 127 which can be verified by changing COUNT in the >> reproducer. >> >> As Trond pointed out in: >> https://lore.kernel.org/all/eb2a551096bb3537a9de7091d203e0cbff8dc6be.camel@xxxxxxxxxxxxxxx/ >> >> POSIX states very explicitly that if you're making changes to the >> directory after the call to opendir() or rewinddir(), then the >> behaviour w.r.t. whether that file appears in the readdir() call is >> unspecified. See >> https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html >> >> The issue here is not quite the same though, we unlink the first batch of >> entries, then do a second getdents(), which returns zero entries even though >> some still exist. I don't think POSIX talks about this case directly. >> >> I guess the question now is if we need to drop the "ls -l" improvement >> because after it we are going to see this behavior on directories with >17 >> entiries instead of >127 entries. > > I don't have any suggestions about how to fix your optimization. I wasn't trying to fix it. I was trying to fix your testing setup. > Technically I think this counts as a regression; Thorsten seems > to agree with that opinion. It's late in the cycle, so it is > appropriate to consider reverting 85aa8ddc3818 and trying again > in v6.3 or v6.4. Thorsten's bot is just scraping your regression report email, I doubt they've carefully read this thread. >> It should be possible to make tmpfs (and friends) generate reliable cookies >> by doing something like hashing out the cursor->d_child into the cookie >> space.. (waving hands) > > Sure, but what if there are non-Linux NFS-exported filesystems > that behave this way? Then they would display this same behavior, and claiming it is a server bug might be defensible position. The reality as I understand it is that if the server is going to change the cookie or offset of the dentries in between calls to READDIR, you're never going to reliably be able to list the directory completely. Or maybe we can, but at least I can't think of any way it can be done. You can ask Trond/Anna to revert this, but that's only going to fix your test setup. The behavior you're claiming is a regression is still there - just at a later offset. Or we can modify the server to make tmpfs and friends generate stable cookies/offsets. Or we can patch git so that it doesn't assume it can walk a directory completely while simultaneously modifying it. What do you think? Ben