On Sat, Feb 4, 2017 at 9:19 PM, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > This allows any subtree to be uid/gid shifted and bound elsewhere. It > does this by operating simlarly to overlayfs. Its primary use is for > shifting the underlying uids of filesystems used to support > unpriviliged (uid shifted) containers. The usual use case here is > that the container is operating with an uid shifted unprivileged root > but sometimes needs to make use of or work with a filesystem image > that has root at real uid 0. > > The mechanism is to allow any subordinate mount namespace to mount a > shiftfs filesystem (by marking it FS_USERNS_MOUNT) but only allowing > it to mount marked subtrees (using the -o mark option as root). Once > mounted, the subtree is mapped via the super block user namespace so > that the interior ids of the mounting user namespace are the ids > written to the filesystem. > > Signed-off-by: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> > James, Allow me to point out some problems in this patch and offer a slightly different approach. First of all, the subject says "uid/gid shifting bind mount", but it's not really a bind mount. What it is is a stackable mount and 2 levels of stack no less. So one thing that is missing is increasing of sb->s_stack_depth and that also means that shiftfs cannot be used to recursively shift uids in child userns if that was ever the intention. The other problem is that by forking overlayfs functionality, shiftfs is going to miss out on overlayfs bug fixes related to user credentials differ from mounter credentials, like fd3220d ("ovl: update S_ISGID when setting posix ACLs"). I am not sure that this specific case is relevant to shiftfs, but there could be other. So how about, instead of forking a new containers specialized stackable fs, that the needed functionality be merged into overlayfs code? I think overlayfs container users may also benefit from shiftfs functionality, no? In any case, overlayfs has considerable millage used as fs for containers, so many issues related to running with different userns may have already been addressed. Overlayfs already stores the mounter's credentials and uses them to perform most of the operations on upper. I know it wasn't the original purpose of overlayfs to run as a single layer, but there is nothing really preventing from doing that. In fact, I am doing just that with my snapshot mount patches, see: https://github.com/amir73il/linux/commit/acc6c25eab03c176c9ef736544fab3fba663765d#diff-2b85a3c5bea4263d08a2bdff639192c3 I registered a new fs type ("snapshot"), which reuses most of the existing overlayfs operations. With this patch it is possible to mount an overlay with only upper layer, so all the operations are pass through except for the credentials, e.g.: mount -t snapshot -o upper=<origin> shiftfs_test <mark location> If you think this concept is workable, then the functionality of mounting overlayfs with only upper should be integrated into plain overlayfs and shiftfs could be a very thin variant of overlayfs mount using shitfs_fs_type, just for the sake of having FS_USERNS_MOUNT, e.g: + /* + * XXX: reusing ovl_mount()/ovl_fill_super(), but could also just reuse + * ovl_dentry_operations/ovl_super_operations/ovl_xattr_handlers/ovl_new_inode() + */ +static struct file_system_type shiftfs_type = { + .owner = THIS_MODULE, + .name = "shiftfs", + .mount = ovl_mount, + .kill_sb = kill_anon_super, + .fs_flags = FS_USERNS_MOUNT, +}; +MODULE_ALIAS_FS("shiftfs"); +MODULE_ALIAS("shiftfs"); +#define IS_SHIFTFS_SB(sb) ((sb)->s_type == &shiftfs_type) And instead of verifying that shiftfs is mounted inside container over shiftfs, verify that it is mounted over an overlayfs noexec mount e.g.: + if (IS_SHIFTFS_SB(sb)) { + /* + * this leg executes if we're admin capable in + * the namespace, so be very careful + */ + if (path.dentry->d_sb->s_magic != OVERLAYFS_MAGIC || !(path.dentry->d_sb->s_iflags & SB_I_NOEXEC)) + goto out_put; >From users manual POV: in host: mount -t overlay -o noexec,upper=<origin> container_visible <mark location> in container: mount -t shiftfs -o upper=<mark location> container_writable <somewhere in my local mount ns> Thought?