Re: [PATCH] fs/open: add new RESOLVE_EMPTY_PATH flag for openat2

Andrey Zhadchenko <andrey.zhadchenko@xxxxxxxxxxxxx> · Mon, 17 Jan 2022 09:35:52 +0300

On 1/14/22 07:28, Aleksa Sarai wrote:
On 2022-01-14, Andrey Zhadchenko <andrey.zhadchenko@xxxxxxxxxxxxx> wrote:

On 1/13/22 10:52, Andrey Zhadchenko wrote:

On 1/13/22 09:46, Aleksa Sarai wrote:
On 2022-01-12, Andrey Zhadchenko <andrey.zhadchenko@xxxxxxxxxxxxx> wrote:
On 1/12/22 17:51, Christian Brauner wrote:
On Thu, Jan 13, 2022 at 01:34:19AM +1100, Aleksa Sarai wrote:
On 2022-01-12, Andrey Zhadchenko
<andrey.zhadchenko@xxxxxxxxxxxxx> wrote:
If you have an opened O_PATH file, currently there
is no way to re-open
it with other flags with openat/openat2. As a
workaround it is possible
to open it via /proc/self/fd/<X>, however
1) You need to ensure that /proc exists
2) You cannot use O_NOFOLLOW flag

There is also another issue -- you can mount on top of
magic-links so if
you're a container runtime that has been tricked into creating bad
mounts of top of /proc/ subdirectories there's no way of
detecting that
this has happened. (Though I think in the long-term we will need to
make it possible for unprivileged users to create a procfs mountfd if
they have hidepid=4,subset=pids set -- there are loads of things
containers need to touch in procfs which can be
overmounted in malicious
ways.)

Yeah, though I see this as a less pressing issue for now. I'd rather
postpone this and make userspace work a bit more. There are ways to
design programs so you know that the procfs instance you're interacting
with is the one you want to interact with without requiring
unprivileged
mounting outside of a userns+pidns+mountns pair. ;)

Both problems may look insignificant, but they are
sensitive for CRIU.
First of all, procfs may not be mounted in the namespace where we are
restoring the process. Secondly, if someone opens a
file with O_NOFOLLOW
flag, it is exposed in /proc/pid/fdinfo/<X>. So CRIU
must also open the
file with this flag during restore.

This patch adds new constant RESOLVE_EMPTY_PATH for resolve field of
struct open_how and changes getname() call to
getname_flags() to avoid
ENOENT for empty filenames.

This is something I've wanted to implement for a while,
but from memory
we need to add some other protections in place before enabling this.

The main one is disallowing re-opening of a path when it
was originally
opened with a different set of modes. [1] is the patch I originally
I looked at this patch. However I am not able to reproduce the problem.
For example, I can't open /proc/self/exe as RDWR with the following:
fd1 = open(/proc/self/exe, O_PATH)
fd2 = open(/proc/self/fd/3, O_RDWR) <- error
or open file with incorrect flags via O_PATH to O_PATH fd from proc
This is fixed or did I understand this problem wrong?

You will get -ETXTBSY because the /proc/self/exe is still a current->mm
of a process. What you need to do is have two processes (or fork+exec a
process and do this):

   1. Grab the /proc/$pid/exe handle of the target process.
   2. Wait for the target process to do an exec() of another program (or
      exit).
   3. *Then* re-open the fd with write permissions. This is allowed
      because the file is no longer being used as the current->mm of a
     process and thus is treated like a regular file handle even though
     it was only ever resolveable through /proc/self/exe which should
     (semantically) only ever be readable.

This attack was used against runc in 2016 and a similar attack was
possible with some later CVEs (I think there was also one against LXC at
some point but I might be mistaken). There were other bugs which lead to
this vector being usable, but my view is that this shouldn't have been
possible in the first place.

I can cook up a simple example if the above description isn't explaining
the issue thoroughly enough.

Thanks for the explanation! I get it now

wrote as part of the openat2(2) (but I dropped it since I wasn't sure
whether it might break some systems in subtle ways -- though according
to my testing there wasn't an issue on any of my machines).

Oh this is the discussion we had around turning an opath fd into a say
O_RDWR fd, I think.
So yes, I think restricting fd reopening makes sense. However, going
from an O_PATH fd to e.g. an fd with O_RDWR does make sense
and needs to
be the default anyway. So we would need to implement this as a denylist
anyway. The default is that opath fds can be reopened with whatever and
only if the opath creator has restricted reopening will it fail, i.e.
it's similar to a denylist.

But this patch wouldn't prevent that or hinder the upgrade mask
restriction afaict.

This issue is actually more complicated than I thought.

What do you think of the following:
1. Add new O_EMPTYPATH constant
2. When we open something with O_PATH, remember access flags (currently
we drop all flags in do_dentry_open() for O_PATH fds). This is similar
to Aleksa Sarai idea, but I do not think we should add some new fields,
because CRIU needs to be able to see it. Just leave access flags
untouched.

There are two problems with this:

   * The problem with this is that O_PATH and O_PATH|O_RDONLY are
     identical. O_RDONLY is defined as 0. I guess by new fields you're

Yes, I didn't thought about that.

     referring to what you'd get from fcntl(F_GETFL)?

     What you're suggesting here is the openat2() O_PATH access mask
     stuff. That is a feature I think would be useful, but it's not
     necessary to get O_EMPTYPATH working.

     If you really need to be able to get the O_PATH re-opening mask of a
     file descriptor (which you probably do for CRIU) we can add that
     information to F_GETFL or some other such interface.

That would be cool. In the patch I saw new FMODE_PATH_READ and
MODE_PATH_WRITE but there was no option to dump it.

   * We need to make sure that the default access modes of O_PATH on
     magic links are correct. We can't simply allow any access mode in
     that case, because if we do then we haven't really solved the
     /proc/self/exe issue.

3. for openat(fd, "", O_EMPTYPATH | <access flags>) additionally check
access flags against the ones we remembered for O_PATH fd

   * We also need to add the same restrictions for opening through
     /proc/self/fd/$n.

This won't solve magiclinks problems but there at least will be API to
avoid procfs and which allow to add some restrictions.

I think the magic link problems need to be solved if we're going to
enshrine this fd reopening behaviour by adding an O_* flag for it.
Though of course this is already an issue with /proc/self/fd/$n
re-opening.

I think these issues are close but still different. Probably we can make
three ideas from this discussion.
1. Add an O_EMPTYPATH flag to re-open O_PATH descriptor. This won't be
really a new feature (since we can already do it via /proc for most
cases). And also this won't break anything.
2. Add modes for magiclinks. This is more restrictive change. However I
don't think any non-malicious programs will do procfs shenanigans and
will be affected by this changes. This is the patch you sent some time
ago

Oops, I didn't notice third patch in you series "open: O_EMPTYPATH:
procfs-less file descriptor re-opening". This is exactly what I tried to
do.
It will be very cool if you resurrect and re-send magic-links
adjustments and O_EMPTYPATH.

I'll rebase it (adding a way to dump the reopening mask for O_PATH
descriptors) and send it next week (assuming it doesn't require too
much tweaking).

Thanks!

It should be noted that on paper you can get the reopening mask with the
current version of the patchset (look at the mode of the magic link in
/proc/self/fd/$n) but that's obviously not a reasonable solution.

3. Add an option to restrict O_PATH re-opening (maybe via fcntl?). And
make it apply if someone tries to do /proc workaround with this exact
O_PATH fd

I originally wanted to do this in openat2() since it feels analogous to
open modes for regular file descriptors (in fact I planned to make
how->mode a union with how->upgrade_mask) but I'll need to think about
how to expose that in fcntl().

Probably new openat2() field is even better. We do not really need new
F_SETFL for O_PATH descriptors, because you would always re-open it with
O_EMPTYPATH again with stricter flags. As for exposing this flags for
CRIU, new line in /proc/<PID>/fdinfo/<N> is fine (and preferable even
if you add new fcntl())

However since I already have a patch which solves this issue, I can work
on reviving it and re-send it.

Why not if it only makes it better