On Fri, May 17, 2024 at 10:53:24AM GMT, Casey Schaufler wrote: > Of course they do. I have been following the use of capabilities > in Linux since before they were implemented. The uptake has been > disappointing in all use cases. Why "Of course"? What if they should not get *all* privileges? > Yes. The problems of a single, all powerful root privilege scheme are > well documented. That's my point, it doesn't have to be this way. > Hardly. Maybe I'm missing something, then. How do I restrict my users from gaining say CAP_NET_ADMIN in their userns today? > If you're going to run userspace that *requires* privilege, you have > to have a way to *allow* privilege. If the userspace insists on a root > based privilege model, you're stuck supporting it. Regardless of your > principles. I want *some* privileges, not *all* of them. > Which is a really, really bad idea. The equation for calculating effective > privilege is already more complicated than userspace developers are generally > willing to put up with. This is generally true, but this set is way more straightforward than the other sets, it's always: pU = pP = pE = X If you look at the patch, there is no transition logic or anything complicated, it's just a set of caps behind inherited. > I would not expect container developers to be eager to learn how to use > this facility. And they probably wouldn't. For most use cases it's going to be enforced through system policies (init, pam, etc). Other than that, usage won't change, you will run your usual `docker run --cap-add ...` to get caps, except now it works in userns. > I'm sorry, but this makes no sense to me whatsoever. You want to introduce > a capability set explicitly for namespaces in order to make them less > special? Maybe I'm just old and cranky. > > > They now work the same way as say a transition to root does with > > inheritable caps. > > That needs some explanation. >From man capabilities(7): In order to mirror traditional UNIX semantics, the kernel performs special treatment of file capabilities when a process with UID 0 (root) executes a program [...] Thus, when [...] a process whose real and effective UIDs are zero execve(2)s a program, the calculation of the process's new permitted capabilities simplifies to: P'(permitted) = P(inheritable) | P(bounding) P'(effective) = P'(permitted) So, the same way a root process is bounded by its inheritable set when it execs, a "rootless" process is bounded by its userns set when it unshares.