On Mon, Nov 13, 2023 at 01:06:16PM -0500, tavianator@xxxxxxxxxxxxxx wrote: > On Fri, 30 Jun 2023 at 13:49:03 -0400, Chuck Lever wrote: > > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > > > The current cursor-based directory offset mechanism doesn't work > > when a tmpfs filesystem is exported via NFS. This is because NFS > > clients do not open directories. Each server-side READDIR operation > > has to open the directory, read it, then close it. The cursor state > > for that directory, being associated strictly with the opened > > struct file, is thus discarded after each NFS READDIR operation. > > > > Directory offsets are cached not only by NFS clients, but also by > > user space libraries on those clients. Essentially there is no way > > to invalidate those caches when directory offsets have changed on > > an NFS server after the offset-to-dentry mapping changes. Thus the > > whole application stack depends on unchanging directory offsets. > > > > The solution we've come up with is to make the directory offset for > > each file in a tmpfs filesystem stable for the life of the directory > > entry it represents. > > > > shmem_readdir() and shmem_dir_llseek() now use an xarray to map each > > directory offset (an loff_t integer) to the memory address of a > > struct dentry. > > I believe this patch is responsible for a tmpfs behaviour change when > a directory is modified while being read. The following test program > > #include <dirent.h> > #include <err.h> > #include <fcntl.h> > #include <stdio.h> > #include <stdlib.h> > #include <sys/stat.h> > #include <unistd.h> > > int main(int argc, char *argv[]) { > const char *tmp = "/tmp"; > if (argc >= 2) > tmp = argv[1]; > > char *dir_path; > if (asprintf(&dir_path, "%s/foo.XXXXXX", tmp) < 0) > err(EXIT_FAILURE, "asprintf()"); > > if (!mkdtemp(dir_path)) > err(EXIT_FAILURE, "mkdtemp(%s)", dir_path); > > char *file_path; > if (asprintf(&file_path, "%s/bar", dir_path) < 0) > err(EXIT_FAILURE, "asprintf()"); > > if (creat(file_path, 0644) < 0) > err(EXIT_FAILURE, "creat(%s)", file_path); > > DIR *dir = opendir(dir_path); > if (!dir) > err(EXIT_FAILURE, "opendir(%s)", dir_path); > > struct dirent *de; > while ((de = readdir(dir))) { > printf("readdir(): %s/%s\n", dir_path, de->d_name); > if (de->d_name[0] == '.') > continue; > > if (unlink(file_path) != 0) > err(EXIT_FAILURE, "unlink(%s)", file_path); > > if (creat(file_path, 0644) < 0) > err(EXIT_FAILURE, "creat(%s)", file_path); > } > > return EXIT_SUCCESS; > } > > when run on Linux 6.5, doesn't print the new directory entry: > > tavianator@graphene $ uname -a > Linux graphene 6.5.9-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000 x86_64 GNU/Linux > tavianator@graphene $ gcc -Wall foo.c -o foo > tavianator@graphene $ ./foo > readdir(): /tmp/foo.wgmdmm/. > readdir(): /tmp/foo.wgmdmm/.. > readdir(): /tmp/foo.wgmdmm/bar > > But on Linux 6.6, readdir() never stops: > > tavianator@tachyon $ uname -a > Linux tachyon 6.6.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 08 Nov 2023 16:05:38 +0000 x86_64 GNU/Linux > tavianator@tachyon $ gcc foo.c -o foo > tavianator@tachyon $ ./foo > readdir(): /tmp/foo.XnIRqj/. > readdir(): /tmp/foo.XnIRqj/.. > readdir(): /tmp/foo.XnIRqj/bar > readdir(): /tmp/foo.XnIRqj/bar > readdir(): /tmp/foo.XnIRqj/bar > readdir(): /tmp/foo.XnIRqj/bar > readdir(): /tmp/foo.XnIRqj/bar > readdir(): /tmp/foo.XnIRqj/bar > readdir(): /tmp/foo.XnIRqj/bar > readdir(): /tmp/foo.XnIRqj/bar > ... > foo: creat(/tmp/foo.TTL6Fg/bar): Too many open files > > POSIX says[1] > > > If a file is removed from or added to the directory after the most recent > > call to opendir() or rewinddir(), whether a subsequent call to readdir() > > returns an entry for that file is unspecified. > > so this isn't necessarily a *bug*, but I just wanted to point out the > behaviour change. I'm betting dollars to donuts that the v6.6 tmpfs behavior doesn't match the behavior of other filesystems either. Thanks for the reproducer, I'll look into it. > I only noticed it because it broke one of my tests in > bfs[2] (in a non-default build configuration). > > [1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html > [2]: https://github.com/tavianator/bfs/blob/main/tests/gnu/ignore_readdir_race_notdir.sh -- Chuck Lever