Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Tue, 07 Feb 2017 11:02:03 -0800

On Tue, 2017-02-07 at 10:10 -0800, Christoph Hellwig wrote:
> On Tue, Feb 07, 2017 at 07:59:00PM +0200, Amir Goldstein wrote:
> > I am not even sure that would be enough.
> > dentry does not contain information about the mount user came from,
> > and sb contains only information about the user ns of the mounter
> > of
> > the file system, not the mounter of the bind mount, right?
> > I think I am missing some big pieces of the big picture.
> > Would love to hear what Eric has to say.
> 
> IFF we want to do what shiftfs does properly we need vfsmount + 
> inode, no need for the dentry.

Yes, sorry ... I was thinking the dentry contained the mnt, but it
doesn't, that's the path.  However, threading the mnt through looks
substantially harder.

> But maybe we need to go back and decice if we want to allow uid/gid
> remapping for arbitrary subtrees anyway.

So those were the original patches Djalal was referring to.  The
problem there is that a lot of orchestration systems don't store images
they want to bind mount into containers on separately mounted
filesystems, which is what's needed to avoid this being per-subtree. 
 However, the clinching argument for me is that the canonical container
image *is* a subtree (unlike a vm image which has to be mounted).  If
we don't make this work on subtrees people go back to daft stacks for
containers like copying the image subtree into a loopback mounted
filesystem just to make this all work (and then complain about
performance and caching and so on).

>   Another option would be to require something like a project as used 
> for project quotas as the root.  This would also be conveniant as it 
> could storge the used remapping tables.

So this would be like the current project quota except set on a
subtree?  I could see it being done that way but I don't see what
advantage it has over using flags in the subtree itself (the mapping is
known based on the mount namespace, so there's really only a single bit
of information to store).

James