VM_DENYWRITE currently relies on i_writecount. Unless there's an active writable reference to an inode, VM_DENYWRITE is not allowed. Unfortunately, alloc_file() does not increase i_writecount, therefore, does not prevent a following VM_DENYWRITE even though the new file might have been opened with FMODE_WRITE. However, callers of alloc_file() expect the file object to be fully instantiated so they can call fput() on it. We could now either fix all callers to do an get_write_access() if opened with FMODE_WRITE, or simply fix alloc_file() to do that. I chose the latter. Note that this bug allows some rather subtle misbehavior. The following sequence of calls should work just fine, but currently fails: int p[2], orig, ro, rw; char buf[128]; pipe(p); sprintf(buf, "/proc/self/fd/%d", p[1]); ro = open(buf, O_RDONLY); close(p[1]); sprintf(buf, "/proc/self/fd/%d", ro); rw = open(buf, O_RDWR); The final open() cannot succeed as close(p[1]) caused an integer underflow on i_writecount, effectively causing VM_DENYWRITE on the inode. The open will fail with -ETXTBUSY. It's a rather odd sequence of calls and given that open() doesn't use alloc_file() (and thus not affected by this bug), it's rather unlikely that this is a serious issue. But stuff like anon_inode shares a *single* inode across a huge set of interfaces. If any of these is broken like pipe(), it will affect all of these (ranging from dma-buf to epoll). Signed-off-by: David Herrmann <dh.herrmann@xxxxxxxxx> --- Hi This patch is only included for reference. It was submitted to fs-devel separately and is being worked on. However, this bug must be fixed in order to make use of memfd_create(), so I decided to include it here. David fs/file_table.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index 5b24008..8059d68 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -168,6 +168,7 @@ struct file *alloc_file(struct path *path, fmode_t mode, const struct file_operations *fop) { struct file *file; + int error; file = get_empty_filp(); if (IS_ERR(file)) @@ -179,15 +180,23 @@ struct file *alloc_file(struct path *path, fmode_t mode, file->f_mode = mode; file->f_op = fop; - /* - * These mounts don't really matter in practice - * for r/o bind mounts. They aren't userspace- - * visible. We do this for consistency, and so - * that we can do debugging checks at __fput() - */ - if ((mode & FMODE_WRITE) && !special_file(path->dentry->d_inode->i_mode)) { - file_take_write(file); - WARN_ON(mnt_clone_write(path->mnt)); + if (mode & FMODE_WRITE) { + error = get_write_access(path->dentry->d_inode); + if (error) { + put_filp(file); + return ERR_PTR(error); + } + + /* + * These mounts don't really matter in practice + * for r/o bind mounts. They aren't userspace- + * visible. We do this for consistency, and so + * that we can do debugging checks at __fput() + */ + if (!special_file(path->dentry->d_inode->i_mode)) { + file_take_write(file); + WARN_ON(mnt_clone_write(path->mnt)); + } } if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) i_readcount_inc(path->dentry->d_inode); -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html