Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@xxxxxxxxxx): > On Sat, Nov 4, 2017 at 4:53 PM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote: > > > > Quoting Mahesh Bandewar (mahesh@xxxxxxxxxxxx): > > > Init-user-ns is always uncontrolled and a process that has SYS_ADMIN > > > that belongs to uncontrolled user-ns can create another (child) user- > > > namespace that is uncontrolled. Any other process (that either does > > > not have SYS_ADMIN or belongs to a controlled user-ns) can only > > > create a user-ns that is controlled. > > > > That's a huge change though. It means that any system that previously > > used unprivileged containers will need new privileged code (which always > > risks more privilege leaks through the new code) to re-enable what was > > possible without privilege before. That's a regression. > > > I wouldn't call it a regression since the existing behavior is > preserved as it is if the default-mask is not altered. i.e. > uncontrolled process can create user-ns and have full control inside > that user-ns. The only difference is - as an example if 'something' > comes up which makes a specific capability expose ring-0, so admin can > quickly remove the capability in question from the mask, so that no > untrusted code can exploit that capability until either the kernel is Oh, sorry, I misread then, and missed that step. I thought the default with this patchset was that there were no capabilities exposed to user namespaces. > patched or workloads are sanitized keeping in mind what was > discovered. (I have given some real life example vulnerabilities > published recently about CAP_NET_RAW in the cover letter) > > > I'm very much interested in what you want to do, But it seems like > > it would be worth starting with some automated code analysis that shows > > exactly what code becomes accessible to unprivileged users with user > > namespaces which was accessible to unprivileged users before. Then we > > can reason about classifying that code and perhaps limiting access to > > some of it. > I would like to look at this as 'a tool' that is available to admins > who can quickly take possible-compromise-situation under-control > probably at the cost of some functionality-loss until kernel is > patched and the mask is restored to default value. The thing that makes me hesitate with this set is that it is a permanent new feature to address what (I hope) is a temporary problem. What would you think about doing this as a stackable (yama-style) LSM? > I'm not sure if automated tools could discover anything since these > changes should not alter behavior in any way. Seems like there are two naive ways to do it, the first being to just look at all code under ns_capable() plus code called from there. It seems like looking at the result of that could be fruitful. -serge -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html