Re: overlayfs access checks on underlying layers

Paul Moore <paul@xxxxxxxxxxxxxx> · Mon, 3 Dec 2018 18:27:57 -0500

On Thu, Nov 29, 2018 at 5:22 PM Daniel Walsh <dwalsh@xxxxxxxxxx> wrote:
> On 11/29/18 2:47 PM, Miklos Szeredi wrote:
> > On Thu, Nov 29, 2018 at 5:14 PM Stephen Smalley <sds@xxxxxxxxxxxxx> wrote:
> >
> >> Possibly I misunderstood you, but I don't think we want to copy-up on
> >> permission denial, as that would still allow the mounter to read/write
> >> special files or execute regular files to which it would normally be
> >> denied access, because the copy would inherit the context specified by
> >> the mounter in the context mount case.  It still represents an
> >> escalation of privilege for the mounter.  In contrast, the copy-up on
> >> write behavior does not allow the mounter to do anything it could not do
> >> already (i.e. read from the lower, write to the upper).
> > Let's get this straight:  when file is copied up, it inherits label
> > from context=, not from label of lower file?
>
> Yes, in the case of context mount, it will get the context mount directory.
>
> In the case of not context mount, it should maintain the label of  the
> lower.
>
> > Next question: permission to change metadata is tied to permission to
> > open?  Is it possible that open is denied, but metadata can be
> > changed?
>
> Yes, SElinux handles open differently then setattr.  Although I am not
> sure if any tools handle this.
>
> > DAC model allows this: metadata change is tied to ownership, not mode
> > bits.   And different capability flag.
> >
> > If the same is true for MAC, then the pre-v4.20-rc1 is already
> > susceptible to the privilege escalation you describe, right?
>
> After talking to Vivek, I am not sure their is a privilege escallation.

More on this below, but this thread doesn't have me convinced, and we
are at -rc5 right now.  We need to come to some decision on this soon
because we are running out of time before v4.20 is released with this
code.

> For device nodes, the mounter has to have the ability to create the
> devicenode with the context mount, if he can do this, then he can do it
> with or without Overlay.  This might lead to users making mistakes on
> security, but the model is sound. And I think this stands even in the
> case of the lower is mounted NODEV and the upper is not.  If the mounter
> can create a device on the upper with a particular label, then he does
> not need the lower.

The problem I have when looking at the current code is that permission
is given, regardless of what is requested, for any special_file() on
an overlayfs mount.

It also looks like the mounter's creds are used when checking
permissions regardless of the file has been copied up or not; I would
expect that the mounter's permissions would only used when checking
permissions against the lower inode, no?  I think there is also some
weird behavior if the underlying inode only allows the mounter to
write (no read) and a write is requested at the overlayfs layer.  I'm
sure I'm missing some subtle thing with overlayfs, but why aren't we
doing something like the following:

  int ovl_permission(...) {

    if (!realinode) {
      ...
    }

    err = generic_permission(inode, mask);
    if (err)
      return err;

    if (upperinode) {
      err = inode_permission(upperinode, mask);
    } else {
      // on the lower inode, always use the mounter's creds
      old_cred = ovl_override_creds(...);

      // check to see if we have the right perms first, if
      // that fails switch to a read/copy-up check if we
      // are doing a write (note: we are not bypassing the
      // exec check, the task can change the metadata like
      // every other fs)
      err = inode_permission(lowerinode, mask);
      if (err && (mask & (MAY_EXEC | MAY_APPEND))) {
        // PM: my guess is that we also need to add a
        // "&& !special_file(lowerinode)" to the conditional
        // above because you can't copy-up a dev node in the
        // normal sense, but we'll leave that as a discussion
        // point for now...
        // turn the write into a read (copy-up)
        mask &= ~(MAY_WRITE | MAY_APPEND);
        mask |= MAY_READ;
        err = inode_permission(lowerinode, mask);
      }

      // reset the creds
      revert_creds(old_cred);
    }

    return err;
  }

> For sockets, I see the case where a process is listening on the lower
> level socket, the mounter mounts the overlay over the directory with the
> socket.  Then the mounter changes the attributes of the socket,
> performing a copy up.  If the mounter can not talk to the socket and the
> other end is still listening, then this could be an issue.  If the
> socket is no longer connected to the listener on the lower, then this is
> not an issue.
>
> Similar for a FIFO.

See my comment "// PM: my guess ..." in the pseudo code above.  I
think the write->read permission mask conversion really should only
apply to normal files where you can do a copy-up.

> With SELinux we are also always checking not only the file access to the
> socker, but also checking whether the label of the client is able to
> talk to the label of the server daemon.  So we are protected by a
> secondary check.

That's making some assumptions on the LSM and the LSM's loaded policy
and is not something I would want to rely on.

-- 
paul moore
www.paul-moore.com