On Wed, 2016-07-06 at 16:22 +0200, Jan Kara wrote: > On Wed 06-07-16 08:54:46, Seth Forshee wrote: > > On Wed, Jul 06, 2016 at 10:54:40AM +0200, Jan Kara wrote: > > > On Mon 04-07-16 11:27:46, Eric W. Biederman wrote: > > > I don't remember the indented uses for user-ns mounts so I may be > > > just wrong. But my experience tells me that external data (such > > > as user namespace ID mappings in your case) that modify meaning > > > of on-disk format tend to cause maintenance difficulties in the > > > long run... Because someone *will* have the idea of migrating > > > these fs images between containers / machines and then they have > > > to make sure mappings get migrated as well and it all becomes > > > cumbersome. > > > > The intended use case for this is containers, with the idea being > > that I as a user will get the same behavior in the container as I > > would in init_user_ns without needing any userspace modifications > > to achieve that. > > > > So if I have a filesystem that contains uid 0 and I mount it in my > > container, I should see uid 0. If I mount the same bits in another > > container with a different uid mapping I should also see uid 0. > > > > If I mkfs a new filesystem in my container then mount it, the root > > directory of the fs is owned by uid 0 in my container without any > > modifications to mkfs. > > > > I'd argue that this makes it easier to migrate a disk between > > containers because the ids in the disk show up the same within the > > container regardless of the id mapping. If someone wants to mount a > > filesystem in one container and also access it in another container > > with a completely different id mapping, well I don't think that's > > ever going to work well. > > OK, I see how this is supposed to work. However you assume here that > both containers have the same set of valid UIDs, don't you? If that > is not the case, the mounted image will not be usable in the other > container, right? You can always set it up wrongly is the rule of containers. Because the virtualizations are so granular, there are many possible configurations which don't make sense in the real world. The main use case for this is operating system images. For them we have a set of known UID/GIDs in the image (usually 0-1000 plus the nobody/nogroups for both). Using this scheme, we'd set up the container in a userns that mapped all these ids to something unprivileged and then set up a s_user_ns to do the same for the mount location of the image meaning that the unprivileged container can now manipulate the image. There are several self contained proposals on linux-fsdevel for doing this, like shifts, which is what I'm currently using to manipulate images, so for me what it does is allows me to get rid of all the credential shifting when performing operations on the underlying filesystem. In fact, I think it pretty much allows me to get rid of a lot of the upper/lower filesystem distinction in shiftfs and I'd get quotas and other stuff I ignored for free. However, any of the other uid/gid shifting proposals can also use this as the engine. The point here is, this patch set is simply mechanism; it requires a glue layer (like shiftfs, fuse or the vfs remapping proposal) to activate it. The activation decides how much exposure to the underlying filesystem there is, so with shiftfs, there's none, it's a purely volatile system crafted for chosen images. However, it's fully possible to come up with an activation where the filesystem would decide (through some on disk format information) to declare the image to be safely remapped in this uid/gid range and then we could allow it to be mounted unprivileged (without a capability check) into a user_ns that matched the mapping. This latter is a bit of a fantasy since container images are currently little more than tar files and we have no extant way to connect them to linux fs formats, but once the possibility exists, whose to say this won't change? James -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html