On Mon, Feb 17, 2020 at 10:56 PM James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > > The object of this series is to replace shiftfs with a proper uid/gid > shifting bind mount instead of the shiftfs hack of introducing > something that looks similar to an overlay filesystem to do it. > > The VFS still has the problem that in order to tell what vfsmount a > dentry belongs to, struct path would have to be threaded everywhere > struct dentry currently is. However, this patch is structured only to > require a rethreading of notify_change. The rest of the knowledge > that a shift is in operation is carried in the task structure by > caching the unshifted credentials. > > Note that although it is currently dependent on the new configfd > interface for bind mounts, only patch 3/3 relies on this, and the > whole thing could be redone as a syscall or any other mechanism > (depending on how people eventually want to fix the problem with the > new fsconfig mechanism being unable to reconfigure bind mounts). > > The changes from v2 are I've added Amir's reviewed-by for the > notify_change rethreading and I've implemented Serge's request for a > base offset shift for the image. It turned out to be much harder to > implement a simple linear shift than simply to do it through a > different userns, so that's how I've done it. The userns you need to > set up for the offset shifted image is one where the interior uid > would see the shifted image as fake root. I've introduced an > additional "ns" config parameter, which must be specified when > building the allow shift mount point (so it's done by the admin, not > by the unprivileged user). I've also taken care that the image > shifted to zero (real root) is never visible in the filesystem. Patch > 3/3 explains how to use the additional "ns" parameter. > > James, To us common people who do not breath containers, your proposal seems like a competing implementation to Christian's proposal [1]. If it were a competing implementation, I think Christian's proposal would have won by points for being less intrusive to VFS. But it is not really a competing implementation, is it? Your proposals meet two different, but very overlapping, set of requirements. IMHO, none of you did a really good job of explaining that in the cover latter, let alone, refer to each others proposals (I am referring to your v3 posting of course). IIUC, Christian's proposal deals with single shared image per non-overlapping groups of containers. And it deals with this use case very elegantly IMO. From your comments on Christian's post, it does not seem that you oppose to his proposal, except that it does not meet the requirements for all of your use cases. IIUC, your proposal can deal with multiple shared images per overlapping groups of containers and it adds an element of "auto-reverse-mapping", which reduces the administration overhead of this to be nightmare of orchestration. It seems to me, that you should look into working your patch set on top of fsid mapping and try to make use of it as much as possible. And to make things a bit more clear to the rest of us, you should probably market your feature as "auto back shifting mount" or something like that and explain the added value of the feature on top of plain fsid mapping. Thanks, Amir. [1] https://lore.kernel.org/linux-fsdevel/20200214183554.1133805-1-christian.brauner@xxxxxxxxxx/