On Sun, May 19, 2024 at 10:03:29AM GMT, Casey Schaufler wrote: > I do understand that. My objection is not to the intent, but to the approach. > Adding a capability set to the general mechanism in support of a limited, specific > use case seems wrong to me. I would rather see a mechanism in userns to limit > the capabilities in a user namespace than a mechanism in capabilities that is > specific to user namespaces. > An option to clone() then, to limit the capabilities available? > I honestly can't recall if that has been suggested elsewhere, and > apologize if it's already been dismissed as a stoopid idea. No and you're right, this would also make sense. This was considered as well as things like ioctl_ns() (basically introducing the concept of capabilities in the user_namespace struct). I also considered reusing the existing sets with various schemes to no avail. The main issue with this approach is that you've to consider how this is going to be used. This ties into the other thread we've had with John and Eric. Basically, we're coming from a model where things are wide open and we're trying to tighten things down. Quoting John here: > We are starting from a different posture here. Where applications have > assumed that user namespaces where safe and no measures were needed. > Tools like unshare and bwrap if set to allow user namespaces in their > fcaps will allow exploits a trivial by-pass. We can't really expect userspace to patch every single userns callsite and opt-in this new security mechanism. You said it well yourself: > Capabilities are already more complicated than modern developers > want to deal with. Moreover, policies are not necessarily enforced at said callsites. Take for example a service like systemd-machined, or a PAM session. Those need to be able to place restrictions on any processes spawned under them. If we do this in clone() (or similar), we'll also need to come up with inheritance rules, being able to query capabilities, etc. At this point we're just reinventing capability sets. Finally the nice thing about having it as a capability set, is that we can easily define rules between them. Patch 2 is a good example of this. It constrains the userns set to the bounding set of a task. Thus, requiring minimal/no change to userspace, and helping with adoption. > Yes, I understand. I would rather see a change to userns in support of a userns > specific need than a change to capabilities for a userns specific need. Valid point, but at the end of the day, those are really just tasks' capabilities. The unshare() just happens to trigger specific rules when it comes to the tasks' creds. This isn't so different than the other sets and their specific rules for execve() or UID 0. This could also be reframed as: Why would setting capabilities on taks in a userns be so different than tasks outside of it?