Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2

Andrey Zhadchenko <andrey.zhadchenko@xxxxxxxxxxxxx> · Thu, 13 Jan 2022 10:52:17 +0300

On 1/13/22 09:46, Aleksa Sarai wrote:
On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@xxxxxxxxxxxxx> wrote:
On 1/12/22 17:51, Christian Brauner wrote:
On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@xxxxxxxxxxxxx> wrote:
If you have an opened O_PATH file, currently there is no way to re-open
it with other flags with openat/openat2. As a workaround it is possible
to open it via /proc/self/fd/<X>, however
1) You need to ensure that /proc exists
2) You cannot use O_NOFOLLOW flag

There is also another issue -- you can mount on top of magic-links so if
you're a container runtime that has been tricked into creating bad
mounts of top of /proc/ subdirectories there's no way of detecting that
this has happened. (Though I think in the long-term we will need to
make it possible for unprivileged users to create a procfs mountfd if
they have hidepid=4,subset=pids set -- there are loads of things
containers need to touch in procfs which can be overmounted in malicious
ways.)

Yeah, though I see this as a less pressing issue for now. I'd rather
postpone this and make userspace work a bit more. There are ways to
design programs so you know that the procfs instance you're interacting
with is the one you want to interact with without requiring unprivileged
mounting outside of a userns+pidns+mountns pair. ;)

Both problems may look insignificant, but they are sensitive for CRIU.
First of all, procfs may not be mounted in the namespace where we are
restoring the process. Secondly, if someone opens a file with O_NOFOLLOW
flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU must also open the
file with this flag during restore.

This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
struct open_how and changes getname() call to getname_flags() to avoid
ENOENT for empty filenames.

This is something I've wanted to implement for a while, but from memory
we need to add some other protections in place before enabling this.

The main one is disallowing re-opening of a path when it was originally
opened with a different set of modes. [1] is the patch I originally
I looked at this patch. However I am not able to reproduce the problem.
For example, I can't open /proc/self/exe as RDWR with the following:
fd1 = open(/proc/self/exe, O_PATH)
fd2 = open(/proc/self/fd/3, O_RDWR) <- error
or open file with incorrect flags via O_PATH to O_PATH fd from proc
This is fixed or did I understand this problem wrong?

You will get -ETXTBSY because the /proc/self/exe is still a current->mm
of a process. What you need to do is have two processes (or fork+exec a
process and do this):

  1. Grab the /proc/$pid/exe handle of the target process.
  2. Wait for the target process to do an exec() of another program (or
     exit).
  3. *Then* re-open the fd with write permissions. This is allowed
     because the file is no longer being used as the current->mm of a
	process and thus is treated like a regular file handle even though
	it was only ever resolveable through /proc/self/exe which should
	(semantically) only ever be readable.

This attack was used against runc in 2016 and a similar attack was
possible with some later CVEs (I think there was also one against LXC at
some point but I might be mistaken). There were other bugs which lead to
this vector being usable, but my view is that this shouldn't have been
possible in the first place.

I can cook up a simple example if the above description isn't explaining
the issue thoroughly enough.

Thanks for the explanation! I get it now

wrote as part of the openat2(2) (but I dropped it since I wasn't sure
whether it might break some systems in subtle ways -- though according
to my testing there wasn't an issue on any of my machines).

Oh this is the discussion we had around turning an opath fd into a say
O_RDWR fd, I think.
So yes, I think restricting fd reopening makes sense. However, going
from an O_PATH fd to e.g. an fd with O_RDWR does make sense and needs to
be the default anyway. So we would need to implement this as a denylist
anyway. The default is that opath fds can be reopened with whatever and
only if the opath creator has restricted reopening will it fail, i.e.
it's similar to a denylist.

But this patch wouldn't prevent that or hinder the upgrade mask
restriction afaict.

This issue is actually more complicated than I thought.

What do you think of the following:
1. Add new O_EMPTYPATH constant
2. When we open something with O_PATH, remember access flags (currently
we drop all flags in do_dentry_open() for O_PATH fds). This is similar
to Aleksa Sarai idea, but I do not think we should add some new fields,
because CRIU needs to be able to see it. Just leave access flags
untouched.

There are two problems with this:

  * The problem with this is that O_PATH and O_PATH|O_RDONLY are
    identical. O_RDONLY is defined as 0. I guess by new fields you're

Yes, I didn't thought about that.

    referring to what you'd get from fcntl(F_GETFL)?

    What you're suggesting here is the openat2() O_PATH access mask
    stuff. That is a feature I think would be useful, but it's not
    necessary to get O_EMPTYPATH working.

    If you really need to be able to get the O_PATH re-opening mask of a
    file descriptor (which you probably do for CRIU) we can add that
    information to F_GETFL or some other such interface.

That would be cool. In the patch I saw new FMODE_PATH_READ and
MODE_PATH_WRITE but there was no option to dump it.

  * We need to make sure that the default access modes of O_PATH on
    magic links are correct. We can't simply allow any access mode in
    that case, because if we do then we haven't really solved the
    /proc/self/exe issue.

3. for openat(fd, "", O_EMPTYPATH | <access flags>) additionally check
access flags against the ones we remembered for O_PATH fd

  * We also need to add the same restrictions for opening through
    /proc/self/fd/$n.

This won't solve magiclinks problems but there at least will be API to
avoid procfs and which allow to add some restrictions.

I think the magic link problems need to be solved if we're going to
enshrine this fd reopening behaviour by adding an O_* flag for it.
Though of course this is already an issue with /proc/self/fd/$n
re-opening.

I think these issues are close but still different. Probably we can make
three ideas from this discussion.
1. Add an O_EMPTYPATH flag to re-open O_PATH descriptor. This won't be
really a new feature (since we can already do it via /proc for most
cases). And also this won't break anything.
2. Add modes for magiclinks. This is more restrictive change. However I
don't think any non-malicious programs will do procfs shenanigans and
will be affected by this changes. This is the patch you sent some time
ago
3. Add an option to restrict O_PATH re-opening (maybe via fcntl?). And
make it apply if someone tries to do /proc workaround with this exact
O_PATH fd

However since I already have a patch which solves this issue, I can work
on reviving it and re-send it.

Why not if it only makes it better