Jeff, Thanks for your comments. More in-line... On Tue, Aug 09, 2022 at 09:49:59AM -0400, Jeff Layton wrote: > On Tue, 2022-08-09 at 01:17 +0000, DANIEL K FORREST wrote: > > I am seeing a strange glob behavior on NFS that I can't explain. > > > > The server has 16 files, foo/bar{01..16}. > > > > There are other files in foo/, and there are many other processes on > > the client accessing files in the directory, but the mount is readonly > > so the only create/delete activity is on the server, and it's all > > rsync, so create file and rename file, but no file deletions. > > > > > > When the 16th file is created (random order) processing is triggered > > by a message from a different host that is running the rsyncs. > > > > On the client, I run this command: > > > > $ stat -c'%z %n' foo/bar{01..16} > > > > And I see all 16 files. > > > > However, if I immediately follow that command with: > > > > $ stat -c'%z %n' foo/bar* > > > > You may want to look at an strace of the shell, and see if it's doing > anything different at the syscall level in these two cases. > I had the same question, so I did that already. The shell is indeed calling readdir. > > On rare occasions I see fewer than 16 files. > > > > The missing files are the ones most recently created, they can be seen > > by stat when explicitly enumerated, but the shell glob does not see > > all of the files. This test is for verifying a problem with a program > > that is also sometimes not seeing files using readdir/glob. > > > > > > How can all 16 files be seen by stat, but not by readdir/glob? > > > > > > OS is CentOS 7.9.2009, 3.10.0-1127.19.1.el7.x86_64 > > NFS mount is version 3, readonly, nordirplus, lookupcache=pos > > > > > > It'd be hard to say without doing deeper analysis, but in order to > process a glob, the shell has to issue one or more readdir() calls. > Those calls can be split into more than one READDIR RPC on the network > as well. > > There is no guarantee that between each READDIR you issue that the > directory remains constant. It's easily possible to issue a readdir for > the first chunk of dentries, and then have a file that's in a later > chunk get renamed so that it's in that chunk. > In this case, the files are created and then the directory is dormant for a number of minutes. Repeated glob operations continue to not see all of the files until the next set of files are created. > You're also using v3. The timestamps on most Linux servers have a > granularity of a jiffy (~1ms). If multiple directory operations happen > within the same jiffy then the timestamp on the directory might not > appear to have changed even though some of its children had been > renamed. You may want to consider using v4 just to get the benefit of > its better cache coherency. > I think this is getting to the heart of the problem. The underlying filesystem has a granularity of 1s and it is becoming clear what you are suggesting is the root cause. I have tried v4 without success, the symptoms persist. > Given that you know what the files are named, you're probably better off > not using shell globs at all here. Just provide all of the file names on > the command line (like in your first example) and you avoid READDIR > activity altogether. For test purposes, I know the files names. In general, the names have a known prefix, but are otherwise unknown. A glob is required. What I need is a way to invalidate the lookup cache, but that seems to require there be a change in the directory timestamp. I tried setting lookupcache=none, but the performance made it unusable. It still seems odd that stat-ing the files individually doesn't get them into the lookup cache. It seems to be only when a readdir is issued and the directory timestamp is seen to have changed. > -- > Jeff Layton <jlayton@xxxxxxxxxx> -- Daniel K. Forrest Space Science and dforrest@xxxxxxxx Engineering Center (608) 890 - 0558 University of Wisconsin, Madison