On Mon, Aug 31, 2020 at 5:32 PM Rich Felker <dalias@xxxxxxxx> wrote: > The pwrite function, originally defined by POSIX (thus the "p"), is > defined to ignore O_APPEND and write at the offset passed as its > argument. However, historically Linux honored O_APPEND if set and > ignored the offset. This cannot be changed due to stability policy, > but is documented in the man page as a bug. > > Now that there's a pwritev2 syscall providing a superset of the pwrite > functionality that has a flags argument, the conforming behavior can > be offered to userspace via a new flag. Since pwritev2 checks flag > validity (in kiocb_set_rw_flags) and reports unknown ones with > EOPNOTSUPP, callers will not get wrong behavior on old kernels that > don't support the new flag; the error is reported and the caller can > decide how to handle it. > > Signed-off-by: Rich Felker <dalias@xxxxxxxx> Reviewed-by: Jann Horn <jannh@xxxxxxxxxx> Note that if this lands, Michael Kerrisk will probably be happy if you send a corresponding patch for the manpage man2/readv.2. Btw, I'm not really sure whose tree this should go through - VFS is normally Al Viro's turf, but it looks like the most recent modifications to this function have gone through Jens Axboe's tree? > --- > > Changes in v2: I've added a check to ensure that RWF_NOAPPEND does not > override O_APPEND for S_APPEND (chattr +a) inodes, and fixed conflicts > with 1752f0adea98ef85, which optimized kiocb_set_rw_flags to work with > a local copy of flags. Unfortunately the same optimization does not > work for RWF_NOAPPEND since it needs to remove flags from the original > set at function entry. > > If desired, I could further change this so that kiocb_flags is > initialized to ki->ki_flags, with assignment-back in place of |= at > the end of the function. This would allow the same local variable > pattern in the RWF_NOAPPEND code path, which might be more elegant, > but I'm not sure if the emitted code would improve or get worse. > > > include/linux/fs.h | 7 +++++++ > include/uapi/linux/fs.h | 5 ++++- > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 7519ae003a08..924e17ac8e7e 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -3321,6 +3321,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags) > return 0; > if (unlikely(flags & ~RWF_SUPPORTED)) > return -EOPNOTSUPP; > + if (unlikely((flags & RWF_APPEND) && (flags & RWF_NOAPPEND))) > + return -EINVAL; > > if (flags & RWF_NOWAIT) { > if (!(ki->ki_filp->f_mode & FMODE_NOWAIT)) > @@ -3335,6 +3337,11 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags) > kiocb_flags |= (IOCB_DSYNC | IOCB_SYNC); > if (flags & RWF_APPEND) > kiocb_flags |= IOCB_APPEND; > + if ((flags & RWF_NOAPPEND) && (ki->ki_flags & IOCB_APPEND)) { > + if (IS_APPEND(file_inode(ki->ki_filp))) > + return -EPERM; > + ki->ki_flags &= ~IOCB_APPEND; > + } > > ki->ki_flags |= kiocb_flags; > return 0; > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h > index f44eb0a04afd..d5e54e0742cf 100644 > --- a/include/uapi/linux/fs.h > +++ b/include/uapi/linux/fs.h > @@ -300,8 +300,11 @@ typedef int __bitwise __kernel_rwf_t; > /* per-IO O_APPEND */ > #define RWF_APPEND ((__force __kernel_rwf_t)0x00000010) > > +/* per-IO negation of O_APPEND */ > +#define RWF_NOAPPEND ((__force __kernel_rwf_t)0x00000020) > + > /* mask of flags supported by the kernel */ > #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ > - RWF_APPEND) > + RWF_APPEND | RWF_NOAPPEND) > > #endif /* _UAPI_LINUX_FS_H */ > -- > 2.21.0 >