Re: [RFC PATCH] f*xattr: allow O_PATH descriptors

Amir Goldstein <amir73il@xxxxxxxxx> · Sat, 18 Jun 2022 18:30:35 +0300

On Sat, Jun 18, 2022 at 2:19 PM Christian Göttsche
<cgzones@xxxxxxxxxxxxxx> wrote:
>
> On Sat, 18 Jun 2022 at 11:11, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > On Sat, Jun 18, 2022 at 6:18 AM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote:
> > >
> > > On 2022-06-08, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > > On Wed, Jun 8, 2022 at 3:48 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, Jun 08, 2022 at 03:28:52PM +0300, Amir Goldstein wrote:
> > > > > > On Wed, Jun 8, 2022 at 2:57 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> > > > > > > > From: Miklos Szeredi <mszeredi@xxxxxxxxxx>
> > > > > > > >
> > > > > > > > Support file descriptors obtained via O_PATH for extended attribute
> > > > > > > > operations.
> > > > > > > >
> > > > > > > > Extended attributes are for example used by SELinux for the security
> > > > > > > > context of file objects. To avoid time-of-check-time-of-use issues while
> > > > > > > > setting those contexts it is advisable to pin the file in question and
> > > > > > > > operate on a file descriptor instead of the path name. This can be
> > > > > > > > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > > > > > > > which might not be mounted e.g. inside of chroots, see[2].
> > > > > > > >
> > > > > > > > [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> > > > > > > > [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> > > > > > > >
> > > > > > > > Original patch by Miklos Szeredi <mszeredi@xxxxxxxxxx>
> > > > > > > > https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@xxxxxxxxxx/
> > > > > > > >
> > > > > > > > > While this carries a minute risk of someone relying on the property of
> > > > > > > > > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > > > > > > > > introducing another set of syscalls.
> > > > > > > > >
> > > > > > > > > Only file->f_path and file->f_inode are accessed in these functions.
> > > > > > > > >
> > > > > > > > > Current versions return EBADF, hence easy to detect the presense of
> > > > > > > > > this feature and fall back in case it's missing.
> > > > > > > >
> > > > > > > > CC: linux-api@xxxxxxxxxxxxxxx
> > > > > > > > CC: linux-man@xxxxxxxxxxxxxxx
> > > > > > > > Signed-off-by: Christian Göttsche <cgzones@xxxxxxxxxxxxxx>
> > > > > > > > ---
> > > > > > >
> > > > > > > I'd be somewhat fine with getxattr and listxattr but I'm worried that
> > > > > > > setxattr/removexattr waters down O_PATH semantics even more. I don't
> > > > > > > want O_PATH fds to be useable for operations which are semantically
> > > > > > > equivalent to a write.
> > > > > >
> > > > > > It is not really semantically equivalent to a write if it works on a
> > > > > > O_RDONLY fd already.
> > > > >
> > > > > The fact that it works on a O_RDONLY fd has always been weird. And is
> > > > > probably a bug. If you look at xattr_permission() you can see that it
> > > >
> > > > Bug or no bug, this is the UAPI. It is not fixable anymore.
> > > >
> > > > > checks for MAY_WRITE for set operations... setxattr() writes to disk for
> > > > > real filesystems. I don't know how much closer to a write this can get.
> > > > >
> > > > > In general, one semantic aberration doesn't justify piling another one
> > > > > on top.
> > > > >
> > > > > (And one thing that speaks for O_RDONLY is at least that it actually
> > > > > opens the file wheres O_PATH doesn't.)
> > > >
> > > > Ok. I care mostly about consistent UAPI, so if you want to set the
> > > > rule that modify f*() operations are not allowed to use O_PATH fd,
> > > > I can live with that, although fcntl(2) may be breaking that rule, but
> > > > fine by me.
> > > > It's good to have consistent rules and it's good to add a new UAPI for
> > > > new behavior.
> > > >
> > > > However...
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > In sensitive environments such as service management/container runtimes
> > > > > > > we often send O_PATH fds around precisely because it is restricted what
> > > > > > > they can be used for. I'd prefer to not to plug at this string.
> > > > > >
> > > > > > But unless I am mistaken, path_setxattr() and syscall_fsetxattr()
> > > > > > are almost identical w.r.t permission checks and everything else.
> > > > > >
> > > > > > So this change introduces nothing new that a user in said environment
> > > > > > cannot already accomplish with setxattr().
> > > > > >
> > > > > > Besides, as the commit message said, doing setxattr() on an O_PATH
> > > > > > fd is already possible with setxattr("/proc/self/$fd"), so whatever security
> > > > > > hole you are trying to prevent is already wide open.
> > > > >
> > > > > That is very much a something that we're trying to restrict for this
> > > > > exact reason and is one of the main motivator for upgrade mask in
> > > > > openat2(). If I want to send a O_PATH around I want it to not be
> > > > > upgradable. Aleksa is working on upgrade masks with openat2() (see [1]
> > > > > and part of the original patchset in [2]. O_PATH semantics don't need to
> > > > > become weird.
> > > > >
> > > > > [1]: https://lore.kernel.org/all/20220526130355.fo6gzbst455fxywy@senku
> > > > > [2]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190728010207.9781-8-cyphar@xxxxxxxxxx
> > > >
> > > > ... thinking forward, if this patch is going to be rejected, the patch that
> > > > will follow is *xattrat() syscalls.
> > > >
> > > > What will you be able to argue then?
> > > >
> > > > There are several *at() syscalls that modify metadata.
> > > > fchownat(.., AT_EMPTY_PATH) is intentionally designed for this.
> > > >
> > > > Do you intend to try and block setxattrat()?
> > > > Just try and block setxattrat(.., AT_EMPTY_PATH)?
> > > > those *at() syscalls have real use cases to avoid TOCTOU races.
> > > > Do you propose that applications will have to use fsetxattr() on an open
> > > > file to avert races?
> > > >
> > > > I completely understand the idea behind upgrade masks
> > > > for limiting f_mode, but I don't know if trying to retroactively
> > > > change semantics of setxattr() in the move to setxattrat()
> > > > is going to be a good idea.
> > >
> > > The goal would be that the semantics of fooat(<fd>, AT_EMPTY_PATH) and
> > > foo(/proc/self/fd/<fd>) should always be identical, and the current
> > > semantics of /proc/self/fd/<fd> are too leaky so we shouldn't always
> > > assume that keeping them makes sense (the most obvious example is being
> > > able to do tricks to open /proc/$pid/exe as O_RDWR).
> >
> > Please make a note that I have applications relying on current magic symlink
> > semantics w.r.t setxattr() and other metadata operations, and the libselinux
> > commit linked from the patch commit message proves that magic symlink
> > semantics are used in the wild, so it is not likely that those semantics could
> > be changed, unless userspace breakage could be justified by fixing a serious
> > security issue (i.e. open /proc/$pid/exe as O_RDWR).
> >
> > >
> > > I suspect that the long-term solution would be to have more upgrade
> > > masks so that userspace can opt-in to not allowing any kind of
> > > (metadata) write access through a particular file descriptor. You're
> > > quite right that we have several metadata write AT_EMPTY_PATH APIs, and
> > > so we can't retroactively block /everything/ but we should try to come
> > > up with less leaky rules by default if it won't break userspace.
> > >
> >
> > Ok, let me try to say this in my own words using an example to see that
> > we are all on the same page:
> >
> > - lsetxattr(PATH_TO_FILE,..) has inherent TOCTOU races
> > - fsetxattr(fd,...) is not applicable for symbolic links
>
> fsetxattr(2) works on symbolic links, e.g. for "security.selinux",
> except for the user namespace:
>
> https://github.com/torvalds/linux/blob/4b35035bcf80ddb47c0112c4fbd84a63a2836a18/fs/xattr.c#L124-L136
> /*
> * In the user.* namespace, only regular files and directories can have
> * extended attributes. For sticky directories, only the owner and
> * privileged users can write attributes.
> */
> if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
>     if (!S_ISREG(inode->i_mode) && !S_ISDIR(inode->i_mode))
>         return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
>     if (S_ISDIR(inode->i_mode) && (inode->i_mode & S_ISVTX) &&
>        (mask & MAY_WRITE) &&
>         !inode_owner_or_capable(mnt_userns, inode))
>         return -EPERM;
> }
>
> Currently it just does not support O_PATH file descriptors.
> And with O_RDONLY setting extended attributes is supported as well
> (fsetxattr(2) does not require O_RDWR or O_WRONLY).

But it is not possible to get a O_RDONLY fd for a symlink object, is it?
That's why I wrote "fsetxattr() is not applicable for symbolic links".
And then libselinux is left to choose between one API that is racy (lsetxattr)
and another API that does not work in containers (magic symlink).
They need to be give a proper API.

>
> > - setxattr("/proc/self/fd/<fd>",...) is the current API to avoid TOCTOU races
> >   when setting xattr on symbolic links
> > - setxattrat(o_path_fd, "", ..., AT_EMPTY_PATH) is proposed as a the
> >   "new API" for setting xattr on symlinks (and special files)
> > - The new API is going to be more strict than the old magic symlink API
> > - *If* it turns out to not break user applications, old API can also become
> >   more strict to align with new API (unlikely the case for setxattr())
> > - This will allow sandboxed containers to opt-out of the "old API", by
> >   restricting access to /proc/self/fd and to implement more fine grained
> >   control over which metadata operations are allowed on an O_PATH fd
> >
> > Did I understand the plan correctly?
> > Do you agree with me that the plan to keep AT_EMPTY_PATH and
> > magic symlink semantics may not be realistic?

Sorry, this gave out messy.
This was supposed to ask whether this part of the plan:
"semantics of fooat(<fd>, AT_EMPTY_PATH) and foo(/proc/self/fd/<fd>)
should always be identical" is realistic, given that applications
already depend on existing setxattr(MAGIC_SYMLINK) semantics.

IMO, it should be fine to keep the same semantic and allow
setxattrat() on O_PATH fd (subject to xattr_permission() of course),
as long as open masks could be added to further restrict the O_PATH
fd from being used for setxattr() in the future.

IOW, we don't have to think of O_PATH as the "least capable" fd mode.
It is just a mode that that is not capable for data manipulations, but
there could be even less capable fd modes that are also not capable
for metadata manipulations such as setxattrat() or chownat().

Does that make sense?

Thanks,
Amir.