Colin Walters <walters@xxxxxxxxxx> writes: > On Thu, Jul 21, 2016, at 12:39 PM, Eric W. Biederman wrote: >> >> This patchset addresses two use cases: >> - Implement a sane upper bound on the number of namespaces. >> - Provide a way for sandboxes to limit the attack surface from >> namespaces. > > Perhaps this is obvious, but since you didn't quite explicitly state it; > do you see this as obsoleting the existing downstream patches > mentioned in: > https://lwn.net/Articles/673597/ > It seems conceptually similar to Kees' original approach, right? Similar yes, and I expect it fills the need. My primary difference is that I believe this approach makes sense from a perspective of assuming that user namespaces or other namespaces are not any buggier than any other piece of kernel code and that people will use them. I don't see these limits making sense from a perspective that user namespaces are flawed and distro kernels should not have enabled them in the first place. That was my perception right or wrong of Kees patches and the related patches that landed in Ubuntu and Debian. With Kees approach I could not see how to handle the case where some applications on the system wanted user namespaces and others don't. Which made it very nasty for future evolution and more deployment of user namespaces. Being per user namespace these limits can be used to sandbox applications without affecting the rest of the system. > The high level makes sense to me...most interesting is > per-userns sysctls. I'll note most current container managers > mount /proc/sys read-only, and Docker specifically drops > CAP_SYS_RESOURCE by default, so they'd likely need to learn > how to undo that if one wanted to support recursive container usage. > We'd probably need to evaluate the safety of having /proc/sys > writable generally. (Also it's rather common to filter out CLONE_NEWUSER > via seccomp, but that's easy to undo) Just using a user namespace replaces most of those precautions. > But that's the flip side - if we're aiming primarily for an upstreamable > way to *limit* namespace usage, it seems sane to me. Yes. The primary target is to stop applications that have gone buggy and allocated a crazy number of namespaces. The secondary target is to allow sandboxes to disable creation of additional namespaces. Just set the limit to 0 and drop caps, or similarly set the limit to 1 and create another fresh set of nested namespaces. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html