On Mon, 2018-06-18 at 08:40 -0500, Seth Forshee wrote: > On Fri, Jun 15, 2018 at 08:03:05PM -0700, James Bottomley wrote: > > On Fri, 2018-06-15 at 09:59 -0500, Seth Forshee wrote: > > > On Fri, Jun 15, 2018 at 08:56:38AM -0500, Serge E. Hallyn wrote: > > > > Quoting Seth Forshee (seth.forshee@xxxxxxxxxxxxx): > > > > > I wanted to inquire about the current status of shiftfs and > > > > > the plans for it moving forward. We'd like to have this > > > > > functionality available for use in lxd, and I'm interesetd in > > > > > helping with development (or picking up development if it's > > > > > stalled). > > > > > > > > > > To start, is anyone still working on shiftfs or similar > > > > > functionality? I haven't found it in any git tree on > > > > > kernel.org, and as far as mailing list activity the last > > > > > submission I can find is [1]. Is there anything newer than > > > > > this? > > > > > > > > > > Based on past mailing list discussions, it seems like there > > > > > was still debate as to whether this feature should be an > > > > > overlay filesystem or something supported at the vfs level. > > > > > Was this ever resolved? > > > > > > > > > > Thanks, > > > > > Seth > > > > > > > > > > [1] > > > > > http://lkml.kernel.org/r/1487638025.2337.49.camel@HansenPartn > > > > > ership.com > > > > > > > > Hey Seth, > > > > > > > > I haven't heard anything in a long time. But if this is going > > > > to pick back up, can we come up with a detailed set of goals > > > > and requirements? > > > > That would actually help. > > > > > I was planning to follow up later with some discussion of > > > requirements. Here are some of ours: > > > > > > - Supports any id maps possible for a user namespace > > > > Could you clarify: right at the moment, it basically reverses the > > namespace ID mapping when it does on to the filesystem using the > > superblock user namespace, so, in theory you can have an arbitrary > > mapping simply by changing the s_userns. The problem here is that > > you don't have a lot of tools for manipulating the s_userns. > > For our purposes the way you're shifting with s_user_ns works fine. I > know that Serge would prefer a more arbitrary shift so that an > arbitrary, unprivileged range in the source fs could be use (e.g. use > ids 100000 - 101000 in the source instead of 0 - 1000), and my > thoughts on that are quoted below. The original (v1) shiftfs did simply take a range of ids to shift as an argument. However, that one could only be set up by root and Eric expressed a desire that it use the s_user_ns. > > > - Does not break inotify > > > > I don't expect it does, but I haven't checked. > > I haven't checked either; I'm planning to do so soon. This is a > concern that was expressed to me by others, I think because inotify > doesn't work with overlayfs. I think shiftfs does work simply because it doesn't really do overlays, so lots of stuff that doesn't work with overlays does work with it. > > > - Passes accurate disk usage and source information from the > > > "underlay" > > > > mounts of this type don't currently show up in df > > > > > - Works with a variety of filesystems (ext4, xfx, btrfs, etc.) > > > > yes > > > > > - Works with nested containers > > > > yes > > I'd say not so much: > > /* to mark a mount point, must be real root */ > if (ssi->mark && !capable(CAP_SYS_ADMIN)) > goto out; > > So within a container I cannot mark a range to be shiftfs-mountable > within a container I create. I'd argue that as long as a user has > CAP_SYS_ADMIN towards sb->s_user_ns for the source filesystem it > should be safe to allow this as it implies privleges wrt all ids > found in the source mount. This will likely lead to stacked shiftfs > mounts, not sure yet whether or not this works in the current code. Um, I think we have different definitions of "works with nested containers". Recall that for a nested container the s_user_ns is also nested, so we shift all the way back to the uid in the root. That means if the check for marking is not capable(CAP_SYS_ADMIN) then an unprivileged user would be able to gain root write access by setting up a nested shift. If your definition of nested means we only shift back one level of user_ns nesting then this could become ns_capable(), so I think we need to add "what is the desired nesting behaviour?" to the questions to be answered by the requirements. James