Re: [PATCH v3 3/3] shmem: stable directory offsets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 6/6/23 15:11, Chuck Lever wrote:
From: Chuck Lever <chuck.lever@xxxxxxxxxx>

The current cursor-based directory offset mechanism doesn't work
when a tmpfs filesystem is exported via NFS. This is because NFS
clients do not open directories. Each server-side READDIR operation
has to open the directory, read it, then close it. The cursor state
for that directory, being associated strictly with the opened
struct file, is thus discarded after each NFS READDIR operation.

Directory offsets are cached not only by NFS clients, but also by
user space libraries on those clients. Essentially there is no way
to invalidate those caches when directory offsets have changed on
an NFS server after the offset-to-dentry mapping changes. Thus the
whole application stack depends on unchanging directory offsets.

The solution we've come up with is to make the directory offset for
each file in a tmpfs filesystem stable for the life of the directory
entry it represents.

shmem_readdir() and shmem_dir_llseek() now use an xarray to map each
directory offset (an loff_t integer) to the memory address of a
struct dentry.

Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
---
  mm/shmem.c |   39 +++++++++++++++++++++++++++++++++++----
  1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 721f9fd064aa..fd9571056181 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2410,7 +2410,8 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block
  			/* Some things misbehave if size == 0 on a directory */
  			inode->i_size = 2 * BOGO_DIRENT_SIZE;
  			inode->i_op = &shmem_dir_inode_operations;
-			inode->i_fop = &simple_dir_operations;
+			inode->i_fop = &stable_dir_operations;
+			stable_offset_init(inode);
  			break;
  		case S_IFLNK:
  			/*
@@ -2950,6 +2951,10 @@ shmem_mknod(struct mnt_idmap *idmap, struct inode *dir,
  		if (error && error != -EOPNOTSUPP)
  			goto out_iput;
+ error = stable_offset_add(dir, dentry);
+		if (error)
+			goto out_iput;
+
  		error = 0;

This line can be removed?

  		dir->i_size += BOGO_DIRENT_SIZE;
  		dir->i_ctime = dir->i_mtime = current_time(dir);
@@ -3027,6 +3032,10 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
  			goto out;
  	}
+ ret = stable_offset_add(dir, dentry);
+	if (ret)
+		goto out;
+

I think this should call shmem_free_inode() before goto out - reverse what shmem_reserve_inode() has done.

  	dir->i_size += BOGO_DIRENT_SIZE;
  	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
  	inode_inc_iversion(dir);
@@ -3045,6 +3054,8 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
  	if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode))
  		shmem_free_inode(inode->i_sb);
+ stable_offset_remove(dir, dentry);
+
  	dir->i_size -= BOGO_DIRENT_SIZE;
  	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
  	inode_inc_iversion(dir);
@@ -3103,24 +3114,37 @@ static int shmem_rename2(struct mnt_idmap *idmap,
  {
  	struct inode *inode = d_inode(old_dentry);
  	int they_are_dirs = S_ISDIR(inode->i_mode);
+	int error;
if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
  		return -EINVAL;
- if (flags & RENAME_EXCHANGE)
+	if (flags & RENAME_EXCHANGE) {
+		stable_offset_remove(old_dir, old_dentry);
+		stable_offset_remove(new_dir, new_dentry);
+		error = stable_offset_add(new_dir, old_dentry);
+		if (error)
+			return error;
+		error = stable_offset_add(old_dir, new_dentry);
+		if (error)
+			return error;
  		return simple_rename_exchange(old_dir, old_dentry, new_dir, new_dentry);
+	}

Hmm, error handling issues? Everything needs to be reversed when any of the operations fails?

if (!simple_empty(new_dentry))
  		return -ENOTEMPTY;
if (flags & RENAME_WHITEOUT) {
-		int error;
-
  		error = shmem_whiteout(idmap, old_dir, old_dentry);
  		if (error)
  			return error;
  	}
+ stable_offset_remove(old_dir, old_dentry);
+	error = stable_offset_add(new_dir, old_dentry);
+	if (error)
+		return error;
+
  	if (d_really_is_positive(new_dentry)) {
  		(void) shmem_unlink(new_dir, new_dentry);
  		if (they_are_dirs) {
@@ -3185,6 +3209,11 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
  		folio_unlock(folio);
  		folio_put(folio);
  	}
+
+	error = stable_offset_add(dir, dentry);
+	if (error)
+		goto out_iput;
+

Error handling, there is a kmemdup() above which needs to be freed? I'm not sure about folio, automatically released with the inode?

  	dir->i_size += BOGO_DIRENT_SIZE;
  	dir->i_ctime = dir->i_mtime = current_time(dir);
  	inode_inc_iversion(dir);
@@ -3920,6 +3949,8 @@ static void shmem_destroy_inode(struct inode *inode)
  {
  	if (S_ISREG(inode->i_mode))
  		mpol_free_shared_policy(&SHMEM_I(inode)->policy);
+	if (S_ISDIR(inode->i_mode))
+		stable_offset_destroy(inode);
  }
static void shmem_init_inode(void *foo)



Thanks,
Bernd



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux