Re: [PATCH v7 3/3] shmem: stable directory offsets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 13, 2023 at 01:06:16PM -0500, tavianator@xxxxxxxxxxxxxx wrote:
> On Fri, 30 Jun 2023 at 13:49:03 -0400, Chuck Lever wrote:
> > From: Chuck Lever <chuck.lever@xxxxxxxxxx>
> >
> > The current cursor-based directory offset mechanism doesn't work
> > when a tmpfs filesystem is exported via NFS. This is because NFS
> > clients do not open directories. Each server-side READDIR operation
> > has to open the directory, read it, then close it. The cursor state
> > for that directory, being associated strictly with the opened
> > struct file, is thus discarded after each NFS READDIR operation.
> >
> > Directory offsets are cached not only by NFS clients, but also by
> > user space libraries on those clients. Essentially there is no way
> > to invalidate those caches when directory offsets have changed on
> > an NFS server after the offset-to-dentry mapping changes. Thus the
> > whole application stack depends on unchanging directory offsets.
> >
> > The solution we've come up with is to make the directory offset for
> > each file in a tmpfs filesystem stable for the life of the directory
> > entry it represents.
> >
> > shmem_readdir() and shmem_dir_llseek() now use an xarray to map each
> > directory offset (an loff_t integer) to the memory address of a
> > struct dentry.
> 
> I believe this patch is responsible for a tmpfs behaviour change when
> a directory is modified while being read.  The following test program
> 
> #include <dirent.h>
> #include <err.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/stat.h>
> #include <unistd.h>
> 
> int main(int argc, char *argv[]) {
> 	const char *tmp = "/tmp";
> 	if (argc >= 2)
> 		tmp = argv[1];
> 
> 	char *dir_path;
> 	if (asprintf(&dir_path, "%s/foo.XXXXXX", tmp) < 0)
> 		err(EXIT_FAILURE, "asprintf()");
> 
> 	if (!mkdtemp(dir_path))
> 		err(EXIT_FAILURE, "mkdtemp(%s)", dir_path);
> 
> 	char *file_path;
> 	if (asprintf(&file_path, "%s/bar", dir_path) < 0)
> 		err(EXIT_FAILURE, "asprintf()");
> 
> 	if (creat(file_path, 0644) < 0)
> 		err(EXIT_FAILURE, "creat(%s)", file_path);
> 
> 	DIR *dir = opendir(dir_path);
> 	if (!dir)
> 		err(EXIT_FAILURE, "opendir(%s)", dir_path);
> 
> 	struct dirent *de;
> 	while ((de = readdir(dir))) {
> 		printf("readdir(): %s/%s\n", dir_path, de->d_name);
> 		if (de->d_name[0] == '.')
> 			continue;
> 
> 		if (unlink(file_path) != 0)
> 			err(EXIT_FAILURE, "unlink(%s)", file_path);
> 
> 		if (creat(file_path, 0644) < 0)
> 			err(EXIT_FAILURE, "creat(%s)", file_path);
> 	}
> 
> 	return EXIT_SUCCESS;
> }
> 
> when run on Linux 6.5, doesn't print the new directory entry:
> 
> tavianator@graphene $ uname -a
> Linux graphene 6.5.9-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000 x86_64 GNU/Linux
> tavianator@graphene $ gcc -Wall foo.c -o foo
> tavianator@graphene $ ./foo
> readdir(): /tmp/foo.wgmdmm/.
> readdir(): /tmp/foo.wgmdmm/..
> readdir(): /tmp/foo.wgmdmm/bar
> 
> But on Linux 6.6, readdir() never stops:
> 
> tavianator@tachyon $ uname -a
> Linux tachyon 6.6.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 08 Nov 2023 16:05:38 +0000 x86_64 GNU/Linux
> tavianator@tachyon $ gcc foo.c -o foo
> tavianator@tachyon $ ./foo
> readdir(): /tmp/foo.XnIRqj/.
> readdir(): /tmp/foo.XnIRqj/..
> readdir(): /tmp/foo.XnIRqj/bar
> readdir(): /tmp/foo.XnIRqj/bar
> readdir(): /tmp/foo.XnIRqj/bar
> readdir(): /tmp/foo.XnIRqj/bar
> readdir(): /tmp/foo.XnIRqj/bar
> readdir(): /tmp/foo.XnIRqj/bar
> readdir(): /tmp/foo.XnIRqj/bar
> readdir(): /tmp/foo.XnIRqj/bar
> ...
> foo: creat(/tmp/foo.TTL6Fg/bar): Too many open files
> 
> POSIX says[1]
> 
> > If a file is removed from or added to the directory after the most recent
> > call to opendir() or rewinddir(), whether a subsequent call to readdir()
> > returns an entry for that file is unspecified.
> 
> so this isn't necessarily a *bug*, but I just wanted to point out the
> behaviour change.

I'm betting dollars to donuts that the v6.6 tmpfs behavior doesn't
match the behavior of other filesystems either.

Thanks for the reproducer, I'll look into it.


> I only noticed it because it broke one of my tests in
> bfs[2] (in a non-default build configuration).
> 
> [1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html
> [2]: https://github.com/tavianator/bfs/blob/main/tests/gnu/ignore_readdir_race_notdir.sh

-- 
Chuck Lever




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux