Andy Lutomirski <luto@xxxxxxx> writes: > On 04/07/2012 10:10 PM, Eric W. Biederman wrote: >> >> This is a course correction for the user namespace, so that we can reach >> an inexpensive, maintainable, and reasonably complete implementation. >> >> If anyone can think of a reason why the user namespace should not >> evolve in the direction taken in this patchset please let me know. >> >> There is not an obvious maintainer for the scope of what this patchset >> covers so I intend to host this tree myself and to place it in >> linux-next after this round of review. >> >> Highlights. >> - The kernel will now fail to build if you attempt to compile in >> code whose permission checks have not been updated to be user >> namespace safe. >> >> - All uids from child user namespaces are mapped into the initial user >> namespace before they are processed. Removing the need to add >> an additional check to see if the user namespace of the compared >> uids remains the same. > > [...] > > I haven't read enough of the details to figure out how the uid mapping > works (do all the child namespace uids map to the same parent uid?), so > I may be missing some details here. You seem to be missing a detail or two. What you want to look at are the functions make_kuid and from_kuid in kernel/user_namespace.c You might look at the patches that talk about uidgid.h and introducing a mapping layer. The implementation creates an incomplete but 1-1 mapping to the uids in the initial user namespace. Which means except for the change in datatype (sigh) the existing permission checks don't need to be changed. I change the data type from uid_t to kuid_t for everything internal and make them assignment incompatible to force the use of make_kuid and from_kuid at the boundaries of user space, and of the filesystems and unfortunately that means uid comparisons themselves must change a little. aka uid_eq. instead of == . That grand search and replace is probably the scariest bit of this patchset. > As a bit of background, the no_new_privs mode introduced in the big > seccomp patchset will add a flag that any task can set to prevent it or > any of its children from gaining privileges by using execve. > > How should this interact with pid namespaces? I assume you mean uid namespaces? I don't expect pid namespaces will have any effect. > As a first pass, I > imagine that the main PR_SET_NO_NEW_PRIVS(1) mode will prevent setuid > from working inside uid namespaces as well, but there may be interest in > weaker variants that allow setuid inside namespaces. > Any thoughts? It looks like a big strength of seccomp is reducing the attack surface of the kernel. The user namespace will actually increase that attack surface by making much more of the functionality available. So on that level they are very different mechanisms. My understanding of no_new_privs is that current_cred() including the user, the user namespace and the security label will never change, with the goal of making the security analysis simple. I don't recall how seccomp filters are dealt with if you don't have no_new_privs enabled. If seccomp filters installed by root are dropped when we change privilege levels it might be worth looking at how to keep a seccomp filter installed as long as you stay in a user namespace. There are essentially two modes you can use the user namespace in: with mappings setup (a privileged operation) and with no mappings. With no mappings you can not create a new user namespace or change or uid or gids, and suid exec fails (or possibly ignores the uid/gid change but I am starting with suid exec fails). Making user namespaces similar to no_new_privs. The emphasis is a bit different from new_new_privs as the user_namespace does not need to guarantee that the lsm will not change security labels, etc. At a basic level of interaction I expect no_new_privs will need to fail any change of the user namespace. As changing the user namespace changes current_cred(), and fundamentally allows more things to happen. Overall I think the two mechanisms are complementary. I actually don't expect any fundamental conflicts in the code between user namespaces and no_new_privs I do expect conflicts in the patches, because some of the same code paths will be touched. Both approaches change exec and the user namespace implementation lightly touches every every permission check in the kernel. Where seccomp focuses on making things secure the user namespace focuses on making things possible that were not before. In particular the user namespace makes it easily possible and generally safe to allow suid exec, creation of namespaces and allowing capable calls with respect to namespaces we have created. With both mechanisms in play I would expect the implementation of no_new_privs to be able focus on limiting the attack surface and not have to worry about allowing more kernel functionality to be used. While the user namespaces can then focus on increasing the functionality exported to unprivileged users. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html