On Tue, Jan 24, 2023 at 02:30:20PM -0500, Jeff Layton wrote: > The NFS server has a lot of special handling for different types of > change attribute access, depending on the underlying filesystem. In > most cases, it's doing a getattr anyway and then fetching that value > after the fact. > > Rather that do that, add a new STATX_CHANGE_COOKIE flag that is a > kernel-only symbol (for now). If requested and getattr can implement it, > it can fill out this field. For IS_I_VERSION inodes, add a generic > implementation in vfs_getattr_nosec. Take care to mask > STATX_CHANGE_COOKIE off in requests from userland and in the result > mask. > > Since not all filesystems can give the same guarantees of monotonicity, > claim a STATX_ATTR_CHANGE_MONOTONIC flag that filesystems can set to > indicate that they offer an i_version value that can never go backward. > > Eventually if we decide to make the i_version available to userland, we > can just designate a field for it in struct statx, and move the > STATX_CHANGE_COOKIE definition to the uapi header. > > Reviewed-by: NeilBrown <neilb@xxxxxxx> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > --- > fs/stat.c | 17 +++++++++++++++-- > include/linux/stat.h | 9 +++++++++ > 2 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/fs/stat.c b/fs/stat.c > index d6cc74ca8486..f43afe0081fe 100644 > --- a/fs/stat.c > +++ b/fs/stat.c > @@ -18,6 +18,7 @@ > #include <linux/syscalls.h> > #include <linux/pagemap.h> > #include <linux/compat.h> > +#include <linux/iversion.h> > > #include <linux/uaccess.h> > #include <asm/unistd.h> > @@ -122,6 +123,11 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat, > stat->attributes_mask |= (STATX_ATTR_AUTOMOUNT | > STATX_ATTR_DAX); > > + if ((request_mask & STATX_CHANGE_COOKIE) && IS_I_VERSION(inode)) { > + stat->result_mask |= STATX_CHANGE_COOKIE; > + stat->change_cookie = inode_query_iversion(inode); > + } > + > mnt_userns = mnt_user_ns(path->mnt); > if (inode->i_op->getattr) > return inode->i_op->getattr(mnt_userns, path, stat, > @@ -602,9 +608,11 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer) > > memset(&tmp, 0, sizeof(tmp)); > > - tmp.stx_mask = stat->result_mask; > + /* STATX_CHANGE_COOKIE is kernel-only for now */ > + tmp.stx_mask = stat->result_mask & ~STATX_CHANGE_COOKIE; > tmp.stx_blksize = stat->blksize; > - tmp.stx_attributes = stat->attributes; > + /* STATX_ATTR_CHANGE_MONOTONIC is kernel-only for now */ > + tmp.stx_attributes = stat->attributes & ~STATX_ATTR_CHANGE_MONOTONIC; > tmp.stx_nlink = stat->nlink; > tmp.stx_uid = from_kuid_munged(current_user_ns(), stat->uid); > tmp.stx_gid = from_kgid_munged(current_user_ns(), stat->gid); > @@ -643,6 +651,11 @@ int do_statx(int dfd, struct filename *filename, unsigned int flags, > if ((flags & AT_STATX_SYNC_TYPE) == AT_STATX_SYNC_TYPE) > return -EINVAL; > > + /* STATX_CHANGE_COOKIE is kernel-only for now. Ignore requests > + * from userland. > + */ > + mask &= ~STATX_CHANGE_COOKIE; > + > error = vfs_statx(dfd, filename, flags, &stat, mask); > if (error) > return error; > diff --git a/include/linux/stat.h b/include/linux/stat.h > index ff277ced50e9..52150570d37a 100644 > --- a/include/linux/stat.h > +++ b/include/linux/stat.h Sorry being late to the party once again... > @@ -52,6 +52,15 @@ struct kstat { > u64 mnt_id; > u32 dio_mem_align; > u32 dio_offset_align; > + u64 change_cookie; > }; > > +/* These definitions are internal to the kernel for now. Mainly used by nfsd. */ > + > +/* mask values */ > +#define STATX_CHANGE_COOKIE 0x40000000U /* Want/got stx_change_attr */ > + > +/* file attribute values */ > +#define STATX_ATTR_CHANGE_MONOTONIC 0x8000000000000000ULL /* version monotonically increases */ maybe it would be better to copy what we do for SB_* vs SB_I_* flags and at least rename them to: STATX_I_CHANGE_COOKIE STATX_I_ATTR_CHANGE_MONOTONIC i_change_cookie to visually distinguish internal and external flags. And also if possible it might be useful to move STATX_I_* flags to the higher 32 bits and then one can use upper_32_bits to retrieve kernel internal flags and lower_32_bits for userspace flags in tiny wrappers. (I did something similar for clone3() a few years ago but there to distinguish between flags available both in clone() and clone3() and such that are only available in clone3().) But just a thought. I mostly worry about accidently leaking this to userspace so ideally we'd even have separate fields in struct kstat for internal and external attributes but that might bump kstat size, though I don't think struct kstat is actually ever really allocated all that much.