Re: [PATCH v3 0/3] introduce a uid/gid shifting bind mount

Christian Brauner <christian.brauner@xxxxxxxxxx> · Tue, 18 Feb 2020 18:26:06 +0100

On Tue, Feb 18, 2020 at 08:11:00AM -0800, James Bottomley wrote:
> On Tue, 2020-02-18 at 09:18 +0200, Amir Goldstein wrote:
> > On Mon, Feb 17, 2020 at 10:56 PM James Bottomley
> > <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > > 
> > > The object of this series is to replace shiftfs with a proper
> > > uid/gid shifting bind mount instead of the shiftfs hack of
> > > introducing something that looks similar to an overlay filesystem
> > > to do it.
> > > 
> > > The VFS still has the problem that in order to tell what vfsmount a
> > > dentry belongs to, struct path would have to be threaded everywhere
> > > struct dentry currently is.  However, this patch is structured only
> > > to require a rethreading of notify_change.  The rest of the
> > > knowledge that a shift is in operation is carried in the task
> > > structure by caching the unshifted credentials.
> > > 
> > > Note that although it is currently dependent on the new configfd
> > > interface for bind mounts, only patch 3/3 relies on this, and the
> > > whole thing could be redone as a syscall or any other mechanism
> > > (depending on how people eventually want to fix the problem with
> > > the new fsconfig mechanism being unable to reconfigure bind
> > > mounts).
> > > 
> > > The changes from v2 are I've added Amir's reviewed-by for the
> > > notify_change rethreading and I've implemented Serge's request for
> > > a base offset shift for the image.  It turned out to be much harder
> > > to implement a simple linear shift than simply to do it through a
> > > different userns, so that's how I've done it.  The userns you need
> > > to set up for the offset shifted image is one where the interior
> > > uid would see the shifted image as fake root.  I've introduced an
> > > additional "ns" config parameter, which must be specified when
> > > building the allow shift mount point (so it's done by the admin,
> > > not by the unprivileged user).  I've also taken care that the image
> > > shifted to zero (real root) is never visible in the
> > > filesystem.  Patch 3/3 explains how to use the additional "ns"
> > > parameter.
> > > 
> > > 
> > 
> > James,
> > 
> > To us common people who do not breath containers, your proposal seems
> > like a competing implementation to Christian's proposal [1].
> 
> I think we have three things that swirl around this space and aren't
> quite direct copies of each other's use cases but aren't entirely
> disjoint either: the superblock user namespace, this and the user
> namespace fsid mapping.
> 
> >  If it were a competing implementation, I think Christian's proposal
> > would have won by points for being less intrusive to VFS.
> 
> Heh, that one's subjective.  I think the current fsid code is missing

I honestly don't think it is subjective. Leaving aside that the patch
here is more invasive to the vfs just in how it needs to alter it, you
currently require a whole new set of syscalls for this feature. The
syscalls themselves even introduce a whole new API and complicated
semantics in the kernel and it's completely unclear whether they will
make it or not. Even if not, this feature will be tied to the new mount
api making it way harder and a longer process to adopt for userspace
given that not even all bits and pieces userspace needs are currently in
any released kernel.

But way more important: what Amir got right is that your approach and
fsid mappings don't stand in each others way at all. Shiftfed
bind-mounts can be implemented completely independent of fsid mappings
after the fact on top of it.

Your example, does this:

nsfd = open("/proc/567/ns/user", O_RDONLY);  /* note: not O_PATH */
configfd_action(fd, CONFIGFD_SET_FD, "ns", NULL, nsfd);

as the ultimate step. Essentially marking a mountpoint as shifted
relative to that user namespace. Once fsid mappings are in all that you
need to do is replace your make_kuid()/from_kuid()/from_kuid_munged()
calls and so on in your patchset with
make_kfsuid()/from_kfsuid()/from_kfsuid_munged() and you're done. So I
honestly don't currently see any need to block the patchsets on each
other. 

Christian