2016-01-28 18:48 GMT+01:00 Eric W. Biederman <ebiederm@xxxxxxxxxxxx>: > Kees Cook <keescook@xxxxxxxxxxxx> writes: > >> + if (sysctl_userns_restrict && !(capable(CAP_SYS_ADMIN) && >> + capable(CAP_SETUID) && >> + capable(CAP_SETGID))) >> + return -EPERM; >> + > > I will also note that the way I have seen containers used this check > adds no security and is not mentioned or justified in any way in your > patch description. > > Furthermore this looks like blame shifting. And quite frankly shifting > the responsibility to users if they get hacked is not an acceptable > attitude. I think I might start understanding your point. Which, if I'm not mistaken, is that it's not user namespaces which are buggy, but rather some pieces of the kernel which would otherwise not be reachable from the typical low-priv level of regular users (e.g. bugs in SOCK_RAW sockets or iptables or mounts)? If so, I can agree with such wording, but the proposed sysctl might still be needed in such case. I guess those bits of the kernel which were not reachable previously from non-priv users historically got much less attention in terms of time spent on security reviews and security fuzzing. And as much as users of the kernel would like to see those pieces of the kernel to be tested to a level that the attack surface reachable from unprivileged users level were tested, it will not happen tomorrow. And our best option now might be to have some switchable setting to disable this attack surface for those users who feel they need it. In the meantime, we can concentrate on sec reviewing those newly reachable kernel APIs, so some day we could remove this sysctl. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html