Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2

Aleksa Sarai <cyphar@xxxxxxxxxx> · Fri, 14 Jan 2022 15:28:17 +1100

On 2022-01-14, Andrey Zhadchenko <andrey.zhadchenko@xxxxxxxxxxxxx> wrote:
> 
> 
> On 1/13/22 10:52, Andrey Zhadchenko wrote:
> > 
> > 
> > On 1/13/22 09:46, Aleksa Sarai wrote:
> > > On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@xxxxxxxxxxxxx> wrote:
> > > > On 1/12/22 17:51, Christian Brauner wrote:
> > > > > On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
> > > > > > On 2022-01-12, Andrey Zhadchenko
> > > > > > <andrey.zhadchenko@xxxxxxxxxxxxx> wrote:
> > > > > > > If you have an opened O_PATH file, currently there
> > > > > > > is no way to re-open
> > > > > > > it with other flags with openat/openat2. As a
> > > > > > > workaround it is possible
> > > > > > > to open it via /proc/self/fd/<X>, however
> > > > > > > 1) You need to ensure that /proc exists
> > > > > > > 2) You cannot use O_NOFOLLOW flag
> > > > > > 
> > > > > > There is also another issue -- you can mount on top of
> > > > > > magic-links so if
> > > > > > you're a container runtime that has been tricked into creating bad
> > > > > > mounts of top of /proc/ subdirectories there's no way of
> > > > > > detecting that
> > > > > > this has happened. (Though I think in the long-term we will need to
> > > > > > make it possible for unprivileged users to create a procfs mountfd if
> > > > > > they have hidepid=4,subset=pids set -- there are loads of things
> > > > > > containers need to touch in procfs which can be
> > > > > > overmounted in malicious
> > > > > > ways.)
> > > > > 
> > > > > Yeah, though I see this as a less pressing issue for now. I'd rather
> > > > > postpone this and make userspace work a bit more. There are ways to
> > > > > design programs so you know that the procfs instance you're interacting
> > > > > with is the one you want to interact with without requiring
> > > > > unprivileged
> > > > > mounting outside of a userns+pidns+mountns pair. ;)
> > > > > 
> > > > > > 
> > > > > > > Both problems may look insignificant, but they are
> > > > > > > sensitive for CRIU.
> > > > > > > First of all, procfs may not be mounted in the namespace where we are
> > > > > > > restoring the process. Secondly, if someone opens a
> > > > > > > file with O_NOFOLLOW
> > > > > > > flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU
> > > > > > > must also open the
> > > > > > > file with this flag during restore.
> > > > > > > 
> > > > > > > This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
> > > > > > > struct open_how and changes getname() call to
> > > > > > > getname_flags() to avoid
> > > > > > > ENOENT for empty filenames.
> > > > > > 
> > > > > > This is something I've wanted to implement for a while,
> > > > > > but from memory
> > > > > > we need to add some other protections in place before enabling this.
> > > > > > 
> > > > > > The main one is disallowing re-opening of a path when it
> > > > > > was originally
> > > > > > opened with a different set of modes. [1] is the patch I originally
> > > > I looked at this patch. However I am not able to reproduce the problem.
> > > > For example, I can't open /proc/self/exe as RDWR with the following:
> > > > fd1 = open(/proc/self/exe, O_PATH)
> > > > fd2 = open(/proc/self/fd/3, O_RDWR) <- error
> > > > or open file with incorrect flags via O_PATH to O_PATH fd from proc
> > > > This is fixed or did I understand this problem wrong?
> > > 
> > > You will get -ETXTBSY because the /proc/self/exe is still a current->mm
> > > of a process. What you need to do is have two processes (or fork+exec a
> > > process and do this):
> > > 
> > >   1. Grab the /proc/$pid/exe handle of the target process.
> > >   2. Wait for the target process to do an exec() of another program (or
> > >      exit).
> > >   3. *Then* re-open the fd with write permissions. This is allowed
> > >      because the file is no longer being used as the current->mm of a
> > >     process and thus is treated like a regular file handle even though
> > >     it was only ever resolveable through /proc/self/exe which should
> > >     (semantically) only ever be readable.
> > > 
> > > This attack was used against runc in 2016 and a similar attack was
> > > possible with some later CVEs (I think there was also one against LXC at
> > > some point but I might be mistaken). There were other bugs which lead to
> > > this vector being usable, but my view is that this shouldn't have been
> > > possible in the first place.
> > > 
> > > I can cook up a simple example if the above description isn't explaining
> > > the issue thoroughly enough.
> > > 
> > 
> > Thanks for the explanation! I get it now
> > 
> > > > > > wrote as part of the openat2(2) (but I dropped it since I wasn't sure
> > > > > > whether it might break some systems in subtle ways -- though according
> > > > > > to my testing there wasn't an issue on any of my machines).
> > > > > 
> > > > > Oh this is the discussion we had around turning an opath fd into a say
> > > > > O_RDWR fd, I think.
> > > > > So yes, I think restricting fd reopening makes sense. However, going
> > > > > from an O_PATH fd to e.g. an fd with O_RDWR does make sense
> > > > > and needs to
> > > > > be the default anyway. So we would need to implement this as a denylist
> > > > > anyway. The default is that opath fds can be reopened with whatever and
> > > > > only if the opath creator has restricted reopening will it fail, i.e.
> > > > > it's similar to a denylist.
> > > > > 
> > > > > But this patch wouldn't prevent that or hinder the upgrade mask
> > > > > restriction afaict.
> > > > 
> > > > This issue is actually more complicated than I thought.
> > > > 
> > > > What do you think of the following:
> > > > 1. Add new O_EMPTYPATH constant
> > > > 2. When we open something with O_PATH, remember access flags (currently
> > > > we drop all flags in do_dentry_open() for O_PATH fds). This is similar
> > > > to Aleksa Sarai idea, but I do not think we should add some new fields,
> > > > because CRIU needs to be able to see it. Just leave access flags
> > > > untouched.
> > > 
> > > There are two problems with this:
> > > 
> > >   * The problem with this is that O_PATH and O_PATH|O_RDONLY are
> > >     identical. O_RDONLY is defined as 0. I guess by new fields you're
> > 
> > Yes, I didn't thought about that.
> > 
> > >     referring to what you'd get from fcntl(F_GETFL)?
> > > 
> > >     What you're suggesting here is the openat2() O_PATH access mask
> > >     stuff. That is a feature I think would be useful, but it's not
> > >     necessary to get O_EMPTYPATH working.
> > > 
> > >     If you really need to be able to get the O_PATH re-opening mask of a
> > >     file descriptor (which you probably do for CRIU) we can add that
> > >     information to F_GETFL or some other such interface.
> > 
> > That would be cool. In the patch I saw new FMODE_PATH_READ and
> > MODE_PATH_WRITE but there was no option to dump it.
> > 
> > > 
> > >   * We need to make sure that the default access modes of O_PATH on
> > >     magic links are correct. We can't simply allow any access mode in
> > >     that case, because if we do then we haven't really solved the
> > >     /proc/self/exe issue.
> > > 
> > > > 3. for openat(fd, "", O_EMPTYPATH | <access flags>) additionally check
> > > > access flags against the ones we remembered for O_PATH fd
> > > 
> > >   * We also need to add the same restrictions for opening through
> > >     /proc/self/fd/$n.
> > > 
> > > > This won't solve magiclinks problems but there at least will be API to
> > > > avoid procfs and which allow to add some restrictions.
> > > 
> > > I think the magic link problems need to be solved if we're going to
> > > enshrine this fd reopening behaviour by adding an O_* flag for it.
> > > Though of course this is already an issue with /proc/self/fd/$n
> > > re-opening.
> > 
> > I think these issues are close but still different. Probably we can make
> > three ideas from this discussion.
> > 1. Add an O_EMPTYPATH flag to re-open O_PATH descriptor. This won't be
> > really a new feature (since we can already do it via /proc for most
> > cases). And also this won't break anything.
> > 2. Add modes for magiclinks. This is more restrictive change. However I
> > don't think any non-malicious programs will do procfs shenanigans and
> > will be affected by this changes. This is the patch you sent some time
> > ago
> 
> Oops, I didn't notice third patch in you series "open: O_EMPTYPATH:
> procfs-less file descriptor re-opening". This is exactly what I tried to
> do.
> It will be very cool if you resurrect and re-send magic-links
> adjustments and O_EMPTYPATH.

I'll rebase it (adding a way to dump the reopening mask for O_PATH
descriptors) and send it next week (assuming it doesn't require too
much tweaking).

It should be noted that on paper you can get the reopening mask with the
current version of the patchset (look at the mode of the magic link in
/proc/self/fd/$n) but that's obviously not a reasonable solution.

> > 3. Add an option to restrict O_PATH re-opening (maybe via fcntl?). And
> > make it apply if someone tries to do /proc workaround with this exact
> > O_PATH fd

I originally wanted to do this in openat2() since it feels analogous to
open modes for regular file descriptors (in fact I planned to make
how->mode a union with how->upgrade_mask) but I'll need to think about
how to expose that in fcntl().

> > > However since I already have a patch which solves this issue, I can work
> > > on reviving it and re-send it.
> > 
> > Why not if it only makes it better

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
Attachment:
signature.asc

Description: PGP signature