Re: [RFC PATCH] f*xattr: allow O_PATH descriptors

Christian Göttsche <cgzones@xxxxxxxxxxxxxx> · Sat, 18 Jun 2022 13:19:43 +0200

On Sat, 18 Jun 2022 at 11:11, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> On Sat, Jun 18, 2022 at 6:18 AM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote:
> >
> > On 2022-06-08, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > On Wed, Jun 8, 2022 at 3:48 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Jun 08, 2022 at 03:28:52PM +0300, Amir Goldstein wrote:
> > > > > On Wed, Jun 8, 2022 at 2:57 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> > > > > > > From: Miklos Szeredi <mszeredi@xxxxxxxxxx>
> > > > > > >
> > > > > > > Support file descriptors obtained via O_PATH for extended attribute
> > > > > > > operations.
> > > > > > >
> > > > > > > Extended attributes are for example used by SELinux for the security
> > > > > > > context of file objects. To avoid time-of-check-time-of-use issues while
> > > > > > > setting those contexts it is advisable to pin the file in question and
> > > > > > > operate on a file descriptor instead of the path name. This can be
> > > > > > > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > > > > > > which might not be mounted e.g. inside of chroots, see[2].
> > > > > > >
> > > > > > > [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> > > > > > > [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> > > > > > >
> > > > > > > Original patch by Miklos Szeredi <mszeredi@xxxxxxxxxx>
> > > > > > > https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@xxxxxxxxxx/
> > > > > > >
> > > > > > > > While this carries a minute risk of someone relying on the property of
> > > > > > > > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > > > > > > > introducing another set of syscalls.
> > > > > > > >
> > > > > > > > Only file->f_path and file->f_inode are accessed in these functions.
> > > > > > > >
> > > > > > > > Current versions return EBADF, hence easy to detect the presense of
> > > > > > > > this feature and fall back in case it's missing.
> > > > > > >
> > > > > > > CC: linux-api@xxxxxxxxxxxxxxx
> > > > > > > CC: linux-man@xxxxxxxxxxxxxxx
> > > > > > > Signed-off-by: Christian Göttsche <cgzones@xxxxxxxxxxxxxx>
> > > > > > > ---
> > > > > >
> > > > > > I'd be somewhat fine with getxattr and listxattr but I'm worried that
> > > > > > setxattr/removexattr waters down O_PATH semantics even more. I don't
> > > > > > want O_PATH fds to be useable for operations which are semantically
> > > > > > equivalent to a write.
> > > > >
> > > > > It is not really semantically equivalent to a write if it works on a
> > > > > O_RDONLY fd already.
> > > >
> > > > The fact that it works on a O_RDONLY fd has always been weird. And is
> > > > probably a bug. If you look at xattr_permission() you can see that it
> > >
> > > Bug or no bug, this is the UAPI. It is not fixable anymore.
> > >
> > > > checks for MAY_WRITE for set operations... setxattr() writes to disk for
> > > > real filesystems. I don't know how much closer to a write this can get.
> > > >
> > > > In general, one semantic aberration doesn't justify piling another one
> > > > on top.
> > > >
> > > > (And one thing that speaks for O_RDONLY is at least that it actually
> > > > opens the file wheres O_PATH doesn't.)
> > >
> > > Ok. I care mostly about consistent UAPI, so if you want to set the
> > > rule that modify f*() operations are not allowed to use O_PATH fd,
> > > I can live with that, although fcntl(2) may be breaking that rule, but
> > > fine by me.
> > > It's good to have consistent rules and it's good to add a new UAPI for
> > > new behavior.
> > >
> > > However...
> > >
> > > >
> > > > >
> > > > > >
> > > > > > In sensitive environments such as service management/container runtimes
> > > > > > we often send O_PATH fds around precisely because it is restricted what
> > > > > > they can be used for. I'd prefer to not to plug at this string.
> > > > >
> > > > > But unless I am mistaken, path_setxattr() and syscall_fsetxattr()
> > > > > are almost identical w.r.t permission checks and everything else.
> > > > >
> > > > > So this change introduces nothing new that a user in said environment
> > > > > cannot already accomplish with setxattr().
> > > > >
> > > > > Besides, as the commit message said, doing setxattr() on an O_PATH
> > > > > fd is already possible with setxattr("/proc/self/$fd"), so whatever security
> > > > > hole you are trying to prevent is already wide open.
> > > >
> > > > That is very much a something that we're trying to restrict for this
> > > > exact reason and is one of the main motivator for upgrade mask in
> > > > openat2(). If I want to send a O_PATH around I want it to not be
> > > > upgradable. Aleksa is working on upgrade masks with openat2() (see [1]
> > > > and part of the original patchset in [2]. O_PATH semantics don't need to
> > > > become weird.
> > > >
> > > > [1]: https://lore.kernel.org/all/20220526130355.fo6gzbst455fxywy@senku
> > > > [2]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190728010207.9781-8-cyphar@xxxxxxxxxx
> > >
> > > ... thinking forward, if this patch is going to be rejected, the patch that
> > > will follow is *xattrat() syscalls.
> > >
> > > What will you be able to argue then?
> > >
> > > There are several *at() syscalls that modify metadata.
> > > fchownat(.., AT_EMPTY_PATH) is intentionally designed for this.
> > >
> > > Do you intend to try and block setxattrat()?
> > > Just try and block setxattrat(.., AT_EMPTY_PATH)?
> > > those *at() syscalls have real use cases to avoid TOCTOU races.
> > > Do you propose that applications will have to use fsetxattr() on an open
> > > file to avert races?
> > >
> > > I completely understand the idea behind upgrade masks
> > > for limiting f_mode, but I don't know if trying to retroactively
> > > change semantics of setxattr() in the move to setxattrat()
> > > is going to be a good idea.
> >
> > The goal would be that the semantics of fooat(<fd>, AT_EMPTY_PATH) and
> > foo(/proc/self/fd/<fd>) should always be identical, and the current
> > semantics of /proc/self/fd/<fd> are too leaky so we shouldn't always
> > assume that keeping them makes sense (the most obvious example is being
> > able to do tricks to open /proc/$pid/exe as O_RDWR).
>
> Please make a note that I have applications relying on current magic symlink
> semantics w.r.t setxattr() and other metadata operations, and the libselinux
> commit linked from the patch commit message proves that magic symlink
> semantics are used in the wild, so it is not likely that those semantics could
> be changed, unless userspace breakage could be justified by fixing a serious
> security issue (i.e. open /proc/$pid/exe as O_RDWR).
>
> >
> > I suspect that the long-term solution would be to have more upgrade
> > masks so that userspace can opt-in to not allowing any kind of
> > (metadata) write access through a particular file descriptor. You're
> > quite right that we have several metadata write AT_EMPTY_PATH APIs, and
> > so we can't retroactively block /everything/ but we should try to come
> > up with less leaky rules by default if it won't break userspace.
> >
>
> Ok, let me try to say this in my own words using an example to see that
> we are all on the same page:
>
> - lsetxattr(PATH_TO_FILE,..) has inherent TOCTOU races
> - fsetxattr(fd,...) is not applicable for symbolic links

fsetxattr(2) works on symbolic links, e.g. for "security.selinux",
except for the user namespace:

https://github.com/torvalds/linux/blob/4b35035bcf80ddb47c0112c4fbd84a63a2836a18/fs/xattr.c#L124-L136
/*
* In the user.* namespace, only regular files and directories can have
* extended attributes. For sticky directories, only the owner and
* privileged users can write attributes.
*/
if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
    if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
        return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
    if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
       (mask & MAY_WRITE) &&
        !inode_owner_or_capable(mnt_userns, inode))
        return -EPERM;
}

Currently it just does not support O_PATH file descriptors.
And with O_RDONLY setting extended attributes is supported as well
(fsetxattr(2) does not require O_RDWR or O_WRONLY).

> - setxattr("/proc/self/fd/<fd>",...) is the current API to avoid TOCTOU races
>   when setting xattr on symbolic links
> - setxattrat(o_path_fd, "", ..., AT_EMPTY_PATH) is proposed as a the
>   "new API" for setting xattr on symlinks (and special files)
> - The new API is going to be more strict than the old magic symlink API
> - *If* it turns out to not break user applications, old API can also become
>   more strict to align with new API (unlikely the case for setxattr())
> - This will allow sandboxed containers to opt-out of the "old API", by
>   restricting access to /proc/self/fd and to implement more fine grained
>   control over which metadata operations are allowed on an O_PATH fd
>
> Did I understand the plan correctly?
> Do you agree with me that the plan to keep AT_EMPTY_PATH and
> magic symlink semantics may not be realistic?
>
> Thanks,
> Amir.