Re: [PATCH v12 05/12] namei: obey trailing magic-link DAC permissions

Aleksa Sarai <cyphar@xxxxxxxxxx> · Wed, 18 Sep 2019 15:51:00 +0200

On 2019-09-17, Jann Horn <jannh@xxxxxxxxxx> wrote:
On Wed, Sep 4, 2019 at 10:21 PM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote:
The ability for userspace to "re-open" file descriptors through
/proc/self/fd has been a very useful tool for all sorts of usecases
(container runtimes are one common example). However, the current
interface for doing this has resulted in some pretty subtle security
holes. Userspace can re-open a file descriptor with more permissions
than the original, which can result in cases such as /proc/$pid/exe
being re-opened O_RDWR at a later date even though (by definition)
/proc/$pid/exe cannot be opened for writing. When combined with O_PATH
the results can get even more confusing.
[...]
Instead we have to restrict it in such a way that it doesn't break
(good) users but does block potential attackers. The solution applied in
this patch is to restrict *re-opening* (not resolution through)
magic-links by requiring that mode of the link be obeyed. Normal
symlinks have modes of a+rwx but magic-links have other modes. These
magic-link modes were historically ignored during path resolution, but
they've now been re-purposed for more useful ends.

Thanks for dealing with this issue!

[...]

diff --git a/fs/namei.c b/fs/namei.c
index 209c51a5226c..54d57dad0f91 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -872,7 +872,7 @@ void nd_jump_link(struct path *path)

        nd->path = *path;
        nd->inode = nd->path.dentry->d_inode;
-       nd->flags |= LOOKUP_JUMPED;
+       nd->flags |= LOOKUP_JUMPED | LOOKUP_MAGICLINK_JUMPED;
 }
[...]
+static int trailing_magiclink(struct nameidata *nd, int acc_mode,
+                             fmode_t *opath_mask)
+{
+       struct inode *inode = nd->link_inode;
+       fmode_t upgrade_mask = 0;
+
+       /* Was the trailing_symlink() a magic-link? */
+       if (!(nd->flags & LOOKUP_MAGICLINK_JUMPED))
+               return 0;
+
+       /*
+        * Figure out the upgrade-mask of the link_inode. Since these aren't
+        * strictly POSIX semantics we don't do an acl_permission_check() here,
+        * so we only care that at least one bit is set for each upgrade-mode.
+        */
+       if (inode->i_mode & S_IRUGO)
+               upgrade_mask |= FMODE_PATH_READ;
+       if (inode->i_mode & S_IWUGO)
+               upgrade_mask |= FMODE_PATH_WRITE;
+       /* Restrict the O_PATH upgrade-mask of the caller. */
+       if (opath_mask)
+               *opath_mask &= upgrade_mask;
+       return may_open_magiclink(upgrade_mask, acc_mode);
 }

This looks racy because entries in the file descriptor table can be
switched out as long as task->files->file_lock isn't held. Unless I'm
missing something, something like the following (untested) would
bypass this restriction:

You're absolutely right -- good catch!

Perhaps you could change nd_jump_link() to "void nd_jump_link(struct
path *path, umode_t link_mode)", and let proc_pid_get_link() pass the
link_mode through from an out-argument of .proc_get_link()? Then
proc_fd_link() could grab the proper mode in a race-free manner. And
nd_jump_link() could stash the mode in the nameidata.

This indeed does appear to be the simplest solution -- I'm currently
testing a variation of the patch you proposed (with a few extra bits to
deal with nd_jump_link and proc_get_link being used elsewhere).

I'll include this change (assuming it fixes the flaw you found) in the
v13 series I'll send around next week. Thanks, Jann!

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
Attachment:
signature.asc

Description: PGP signature