On Sat, Aug 29, 2020 at 4:00 AM Rich Felker <dalias@xxxxxxxx> wrote: > The pwrite function, originally defined by POSIX (thus the "p"), is > defined to ignore O_APPEND and write at the offset passed as its > argument. However, historically Linux honored O_APPEND if set and > ignored the offset. This cannot be changed due to stability policy, > but is documented in the man page as a bug. > > Now that there's a pwritev2 syscall providing a superset of the pwrite > functionality that has a flags argument, the conforming behavior can > be offered to userspace via a new flag. [...] > diff --git a/include/linux/fs.h b/include/linux/fs.h [...] > @@ -3411,6 +3413,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags) > ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC); > if (flags & RWF_APPEND) > ki->ki_flags |= IOCB_APPEND; > + if (flags & RWF_NOAPPEND) > + ki->ki_flags &= ~IOCB_APPEND; > return 0; > } Linux enforces the S_APPEND flag (set by "chattr +a") only at open() time, not at write() time: # touch testfile # exec 100>testfile # echo foo > testfile # cat testfile foo # chattr +a testfile # echo bar > testfile bash: testfile: Operation not permitted # echo bar >&100 # cat testfile bar # At open() time, the kernel enforces that you can't use O_WRONLY/O_RDWR without also setting O_APPEND if the file is marked as append-only: static int may_open(const struct path *path, int acc_mode, int flag) { [...] /* * An append-only file must be opened in append mode for writing. */ if (IS_APPEND(inode)) { if ((flag & O_ACCMODE) != O_RDONLY && !(flag & O_APPEND)) return -EPERM; if (flag & O_TRUNC) return -EPERM; } [...] } It seems to me like your patch will permit bypassing S_APPEND by opening an append-only file with O_WRONLY|O_APPEND, then calling pwritev2() with RWF_NOAPPEND? I think you'll have to add an extra check for IS_APPEND() somewhere. One could also argue that if an O_APPEND file descriptor is handed across privilege boundaries, a programmer might reasonably expect that the recipient will not be able to use the file descriptor for non-append writes; if that is not actually true, that should probably be noted in the open.2 manpage, at the end of the description of O_APPEND.