Re: shiftfs status and future development

Seth Forshee <seth.forshee@xxxxxxxxxxxxx> · Fri, 15 Jun 2018 15:47:56 -0500

On Fri, Jun 15, 2018 at 10:22:09AM -0700, James Bottomley wrote:
> On Sat, 2018-06-16 at 03:04 +1000, Aleksa Sarai wrote:
> > On 2018-06-15, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx
> > > wrote:
> > > > >  - Supports any id maps possible for a user namespace
> > > > 
> > > > Have we already ruled out storing the container's UID/GID/perms
> > > > in an extended attribute, and having all the files owned by the
> > > > owner of the container from the perspective of the unshifted
> > > > fs.  Then shiftfs reads the xattr and presents the files with the
> > > > container's idea of what the UID is?
> > > 
> > > I've got an experimental patch set that does the *mark* as an
> > > xattr. 
> > 
> > I forgot to ask you about this when we all met face-to-face -- can
> > you go over what the purpose of marking the mounts before being able
> > to shifts is? When I saw your demo at LPC I was quite confused about
> > what it was doing (I think you mentioned it was a security feature,
> > but I must admit I didn't follow the explanation).
> 
> OK, so the basic security problem is that an unprivileged tenant cannot
> be allowed arbitrary access to both the shifted and underlying
> unshifted locations because they can do writes to the shifted mount
> that appear at real uid/gid 0 in the underlying unshifted location,
> setting up all sorts of unpleasant threats of which suid execution is
> just the most obvious one.
> 
> My mount marking solution, which the v2 (and forthcoming v3) has is the
> idea that the admin buries the real underlying location deep in a path
> inaccessible (to the tenant) part of the filesystem and then exposes a
> marked mount point to the tenant by doing
> 
> mount -t shiftfs -o mark <underlying location> <tenant visible>
> 
> Then in the <tenant visible> location we can block the potential
> exploits.  When the tenant is building an unprivileged container, it
> can do
> 
> mount -t shiftfs <tenant visible> <container location>
> 
> And the <container location> will now have the shifting in place.

More generally, we can't allow an unprivileged user ns to mount any
subtree with an id shift unless the context that controls that subtree
(i.e. CAP_SYS_ADMIN in sb->s_user_ns) allows it. Otherwise it would be a
simple matter for any user to create a user ns and make an id shifted
mount of /. The marking in shiftfs is one way of solving this problem.

I don't know if you saw my comments about marking earlier in the thread.
Tl;dr, I think that the new mount apis in the filesystem context patches
could allow an alternative to marking. I think we should be able to
arrange it so that the "host" context sets up a mount fd for shiftfs
mounting a sepecific subtree then passes that fd into the container. The
container can then use the fd to attach the mount to its filesystem
tree. This will provide all the benefits of marking without that awkward
intermediate mount point.

Of course those patches haven't been merged yet, but based on the
discussion I've seen their prospects look good.

> This scheme is ephemeral (the marked mount has to be recreated on every
> boot) and rather complex, so the alternative is to add a permanent mark
> to <underlying location> so that regular tenant access can be secured
> (or even prohibited) but the tenant can still do
> 
> mount -t shiftfs <underlying location> <container location>

This of course would not be possible in my proposed mount fd scheme.

> To get the shifting properties in the container.  In this version of
> the scheme, the shift mountable directory is marked with a security
> xattr that is permanent (survives reboot) but requires that the
> filesystem support xattrs, of course.
> 
> The down side of the xattr scheme is that the securing against the
> tenant part becomes an xattr enforced thing rather than a shiftfs
> enforced thing, so it has to be an additional patch to the kernel
> itself rather than being inside a self contained module.

Would this work for nested containers? I guess it should be fine to
allow setting that xattr for CAP_SYS_ADMIN in sb->s_user_ns, so probably
so.

Thanks,
Seth