On Fri, 30 Jun 2023 at 13:49:03 -0400, Chuck Lever wrote: > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > The current cursor-based directory offset mechanism doesn't work > when a tmpfs filesystem is exported via NFS. This is because NFS > clients do not open directories. Each server-side READDIR operation > has to open the directory, read it, then close it. The cursor state > for that directory, being associated strictly with the opened > struct file, is thus discarded after each NFS READDIR operation. > > Directory offsets are cached not only by NFS clients, but also by > user space libraries on those clients. Essentially there is no way > to invalidate those caches when directory offsets have changed on > an NFS server after the offset-to-dentry mapping changes. Thus the > whole application stack depends on unchanging directory offsets. > > The solution we've come up with is to make the directory offset for > each file in a tmpfs filesystem stable for the life of the directory > entry it represents. > > shmem_readdir() and shmem_dir_llseek() now use an xarray to map each > directory offset (an loff_t integer) to the memory address of a > struct dentry. I believe this patch is responsible for a tmpfs behaviour change when a directory is modified while being read. The following test program #include <dirent.h> #include <err.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <sys/stat.h> #include <unistd.h> int main(int argc, char *argv[]) { const char *tmp = "/tmp"; if (argc >= 2) tmp = argv[1]; char *dir_path; if (asprintf(&dir_path, "%s/foo.XXXXXX", tmp) < 0) err(EXIT_FAILURE, "asprintf()"); if (!mkdtemp(dir_path)) err(EXIT_FAILURE, "mkdtemp(%s)", dir_path); char *file_path; if (asprintf(&file_path, "%s/bar", dir_path) < 0) err(EXIT_FAILURE, "asprintf()"); if (creat(file_path, 0644) < 0) err(EXIT_FAILURE, "creat(%s)", file_path); DIR *dir = opendir(dir_path); if (!dir) err(EXIT_FAILURE, "opendir(%s)", dir_path); struct dirent *de; while ((de = readdir(dir))) { printf("readdir(): %s/%s\n", dir_path, de->d_name); if (de->d_name[0] == '.') continue; if (unlink(file_path) != 0) err(EXIT_FAILURE, "unlink(%s)", file_path); if (creat(file_path, 0644) < 0) err(EXIT_FAILURE, "creat(%s)", file_path); } return EXIT_SUCCESS; } when run on Linux 6.5, doesn't print the new directory entry: tavianator@graphene $ uname -a Linux graphene 6.5.9-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000 x86_64 GNU/Linux tavianator@graphene $ gcc -Wall foo.c -o foo tavianator@graphene $ ./foo readdir(): /tmp/foo.wgmdmm/. readdir(): /tmp/foo.wgmdmm/.. readdir(): /tmp/foo.wgmdmm/bar But on Linux 6.6, readdir() never stops: tavianator@tachyon $ uname -a Linux tachyon 6.6.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 08 Nov 2023 16:05:38 +0000 x86_64 GNU/Linux tavianator@tachyon $ gcc foo.c -o foo tavianator@tachyon $ ./foo readdir(): /tmp/foo.XnIRqj/. readdir(): /tmp/foo.XnIRqj/.. readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar readdir(): /tmp/foo.XnIRqj/bar ... foo: creat(/tmp/foo.TTL6Fg/bar): Too many open files POSIX says[1] > If a file is removed from or added to the directory after the most recent > call to opendir() or rewinddir(), whether a subsequent call to readdir() > returns an entry for that file is unspecified. so this isn't necessarily a *bug*, but I just wanted to point out the behaviour change. I only noticed it because it broke one of my tests in bfs[2] (in a non-default build configuration). [1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html [2]: https://github.com/tavianator/bfs/blob/main/tests/gnu/ignore_readdir_race_notdir.sh