Re: [PATCH 0/3] Introduce user namespace capabilities

Casey Schaufler <casey@xxxxxxxxxxxxxxxx> · Fri, 17 May 2024 10:53:24 -0700

On 5/17/2024 4:42 AM, Jonathan Calmels wrote:
>>>> On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote:
>>>>> I suggest that adding a capability set for user namespaces is a bad idea:
>>>>> 	- It is in no way obvious what problem it solves
>>>>> 	- It is not obvious how it solves any problem
>>>>> 	- The capability mechanism has not been popular, and relying on a
>>>>> 	  community (e.g. container developers) to embrace it based on this
>>>>> 	  enhancement is a recipe for failure
>>>>> 	- Capabilities are already more complicated than modern developers
>>>>> 	  want to deal with. Adding another, special purpose set, is going
>>>>> 	  to make them even more difficult to use.
> Sorry if the commit wasn't clear enough.

While, as others have pointed out, the commit description left
much to be desired, that isn't the biggest problem with the change
you're proposing.

>  Basically:
>
> - Today user namespaces grant full capabilities.

Of course they do. I have been following the use of capabilities
in Linux since before they were implemented. The uptake has been
disappointing in all use cases.

>   This behavior is often abused to attack various kernel subsystems.

Yes. The problems of a single, all powerful root privilege scheme are
well documented.

>   Only option

Hardly.

>  is to disable them altogether which breaks a lot of
>   userspace stuff.

Updating userspace components to behave properly in a capabilities
environment has never been a popular activity, but is the right way
to address this issue. And before you start on the "no one can do that,
it's too hard", I'll point out that multiple UNIX systems supported
rootless, all capabilities based systems back in the day. 

>   This goes against the least privilege principle.

If you're going to run userspace that *requires* privilege, you have
to have a way to *allow* privilege. If the userspace insists on a root
based privilege model, you're stuck supporting it. Regardless of your
principles.

>
> - It adds a new capability set.

Which is a really, really bad idea. The equation for calculating effective
privilege is already more complicated than userspace developers are generally
willing to put up with.

>   This set dictates what capabilities are granted in namespaces (instead
>   of always getting full caps).

I would not expect container developers to be eager to learn how to use
this facility.

>   This brings namespaces in line with the rest of the system, user
>   namespaces are no more "special".

I'm sorry, but this makes no sense to me whatsoever. You want to introduce
a capability set explicitly for namespaces in order to make them less
special? Maybe I'm just old and cranky.

>   They now work the same way as say a transition to root does with
>   inheritable caps.

That needs some explanation.

>
> - This isn't intended to be used by end users per se (although they could).
>   This would be used at the same places where existing capabalities are
>   used today (e.g. init system, pam, container runtime, browser
>   sandbox), or by system administrators.

I understand that. It is for containers. Containers are not kernel entities.

>
> To give you some ideas of things you could do:
>
> # E.g. prevent alice from getting CAP_NET_ADMIN in user namespaces under SSH
> echo "auth optional pam_cap.so" >> /etc/pam.d/sshd
> echo "!cap_net_admin alice" >> /etc/security/capability.conf.
>
> # E.g. prevent any Docker container from ever getting CAP_DAC_OVERRIDE
> systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \
>             -p SecureBits=userns-strict-caps \
>             /usr/bin/dockerd
>
> # E.g. kernel could be vulnerable to CAP_SYS_RAWIO exploits
> # Prevent users from ever gaining it
> sysctl -w cap_bound_userns_mask=0x1fffffdffff