On Tue, 2017-02-07 at 19:59 +0200, Amir Goldstein wrote: > On Tue, Feb 7, 2017 at 6:37 PM, James Bottomley > <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, 2017-02-07 at 01:19 -0800, Christoph Hellwig wrote: > > > On Sat, Feb 04, 2017 at 11:19:32AM -0800, James Bottomley wrote: > > > > This allows any subtree to be uid/gid shifted and bound > > > > elsewhere. > > > > It does this by operating simlarly to overlayfs. Its primary > > > > use > > > > is for shifting the underlying uids of filesystems used to > > > > support > > > > unpriviliged (uid shifted) containers. The usual use case here > > > > is > > > > that the container is operating with an uid shifted > > > > unprivileged > > > > root but sometimes needs to make use of or work with a > > > > filesystem > > > > image that has root at real uid 0. > > > > > > > > The mechanism is to allow any subordinate mount namespace to > > > > mount > > > > a shiftfs filesystem (by marking it FS_USERNS_MOUNT) but only > > > > allowing it to mount marked subtrees (using the -o mark option > > > > as > > > > root). Once mounted, the subtree is mapped via the super block > > > > user namespace so that the interior ids of the mounting user > > > > namespace are the ids written to the filesystem. > > > > > > Please move this into VFS instead of a stackable fs. We might > > > need > > > addtional parameters to getattr/setattr to specify the ID > > > translation, but that's why better than a horrible hack like > > > this. > > > > I would need a lot more than that: getattr controls the cosmetic > > permission display to the user, but enforcement is done in the core > > permission checks which are inode based. To make this a real bind > > mount, the core permission checks will have to become subtree aware > > because knowledge of whether we need a uid shift in the permission > > check becomes a subtree property. Effectively inode_permission > > would > > become dentry_permission and generic_permission would take a dentry > > instead of an inode. This will be a huge amount of VFS and > > underlying > > filesystem churn, since the permissions calls are threaded through > > a > > huge chunk of code. > > > > I am not even sure that would be enough. > dentry does not contain information about the mount user came from, > and sb contains only information about the user ns of the mounter of > the file system, not the mounter of the bind mount, right? > I think I am missing some big pieces of the big picture. > Would love to hear what Eric has to say. I'm not really sure until it gets prototyped, but I think the filesystem user namespace would also have to become a subtree property. The whole reason for shiftfs being a properly mounted filesystem is because it needs a super block to capture the namespace it's being mounted in. However, when you have a container that you want remapping inside, you must have a user namespace which owns a mount namespace, so we can deduce the information from the mount namespace. All we probably need the subtree to tell us is if we're shifting or not. James