Quoting Jann Horn (jann@xxxxxxxxx): > On Tue, May 03, 2016 at 12:54:40AM -0500, Eric W. Biederman wrote: > > "Serge E. Hallyn" <serge@xxxxxxxxxx> writes: > > > > > Quoting Andrew G. Morgan (morgan@xxxxxxxxxx): > > >> > > >> I guess I'm confused how we have strayed so far that this isn't an obvious > > >> requirement. Uid=0 as being the root of privilege was the basic problem > > >> that capabilities were designed to change. > > > > > > The task executing the file can be any uid mapped into the namespace. The > > > file only has to be owned by the root of the user_ns. Which I agree is > > > unfortunate. We can work around it by putting the root uid into the xattr > > > itself (which still isn't orthogonal but allows the file to at least by > > > owned by non-root), but the problem then is that a task needs to know its > > > global root k_uid just to write the xattr. > > > > The root kuid is just make_kuids(user_ns, 0) so it is easy to find. > > > > It might be a hair better to use the userns->owner instead of the root > > uid. That would allow user namespaces without a mapped root to still > > use file capabilities. > > Please don't do that. A user might want to create multiple containers with > isolated security properties, and in that case, it would be bad if binaries > that are capable in one container are also automatically valid in the user's > other containers. But no, if the namespaces both created by uid 1000 have disjoint uid mappings, say 100000-165535 and 200000-265536, then a file capability on a file owned by 200000 would not be active when the exec()ing task has only 100000-165535 mapped. If the uid mappings are not completely disjoint, then you cannot assume that the user wanted the mappings to be disjoint. In particular, while setting up a rootfs for a container I'll frequently use small ad-hoc namespaces to chown files. For instance, my intended final container mapping may be 65536 uids starting at 100000, but to chown a file to uid 5 in the container I may create a ns with kuid 1000 as uid 0 and 100005 as uid 1. Here the root uid doing the writing is not even mapped into the final container namespace. So a new approach might be to (a) note the kuid which created the namespace in the xattr (magically written by the kernel at xattr write time), then say that for the file capability to take effect, two things must hold: 1. the kuid noted in the xattr must match the kuid which created the calling task's user_ns (or any ancestor creator) 2. the file uid must map into the calling task's namespace To write the filecap, 1. the task must be privileged over the uid which owns the file (in the sense of capable_wrt_inode_uidgid) 2. the task must be privileged over his own user_ns As Eric said, this should address Andrew Morgan's concern about requiring that the file be owned by uid 0 in the namespace. There's a problem though. The above suffices to prevent an unprivileged user in a user_ns from unsharing a user_ns to write a file capability and exploit that capability in the ns where he is unprivileged. With one exception, which is the case where the unprivileged user is mapped to the same kuid which created the namespace. So if uid 1000 on the host creates a namespace where uid 1000 maps to 1000 in the namespace, then 1000 in the namespace can create a new user_ns, write the xattr, and exploit it from the parent namespace. This is not an uncommon case. I'm not sure what to do about it. > Also, this would mean that in an owner!=root scenario, container root can't > setcap executables and needs to ask the administrator of the surrounding system > to do it. > (Of course, this could be worked around using a dummy userns layer between the > init ns and the container, but I don't like seeing new reasons for such a hack.) -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html