Re: [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Wed, 08 Apr 2020 17:14:27 +0200

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes:

> On Mon, Apr 6, 2020 at 4:34 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>>
>> Andrii Nakryiko <andriin@xxxxxx> writes:
>>
>> > Add support to look up bpf_link by ID and iterate over all existing bpf_links
>> > in the system. GET_FD_BY_ID code handles not-yet-ready bpf_link by checking
>> > that its ID hasn't been set to non-zero value yet. Setting bpf_link's ID is
>> > done as the very last step in finalizing bpf_link, together with installing
>> > FD. This approach allows users of bpf_link in kernel code to not worry about
>> > races between user-space and kernel code that hasn't finished attaching and
>> > initializing bpf_link.
>> >
>> > Further, it's critical that BPF_LINK_GET_FD_BY_ID only ever allows to create
>> > bpf_link FD that's O_RDONLY. This is to protect processes owning bpf_link and
>> > thus allowed to perform modifications on them (like LINK_UPDATE), from other
>> > processes that got bpf_link ID from GET_NEXT_ID API. In the latter case, only
>> > querying bpf_link information (implemented later in the series) will be
>> > allowed.
>>
>> I must admit I remain sceptical about this model of restricting access
>> without any of the regular override mechanisms (for instance, enforcing
>> read-only mode regardless of CAP_DAC_OVERRIDE in this series). Since you
>> keep saying there would be 'some' override mechanism, I think it would
>> be helpful if you could just include that so we can see the full
>> mechanism in context.
>
> I wasn't aware of CAP_DAC_OVERRIDE, thanks for bringing this up.
>
> One way to go about this is to allow creating writable bpf_link for
> GET_FD_BY_ID if CAP_DAC_OVERRIDE is set. Then we can allow LINK_DETACH
> operation on writable links, same as we do with LINK_UPDATE here.
> LINK_DETACH will do the same as cgroup bpf_link auto-detachment on
> cgroup dying: it will detach bpf_link, but will leave it alive until
> last FD is closed.

Yup, I think this would be a reasonable way to implement the override
mechanism - it would ensure 'full root' users (like a root shell) can
remove attachments, while still preventing applications from doing so by
limiting their capabilities.

Extending on the concept of RO/RW bpf_link attachments, maybe it should
even be possible for an application to choose which mode it wants to pin
its fd in? With the same capability being able to override it of
course...

> We need to consider, though, if CAP_DAC_OVERRIDE is something that can
> be disabled for majority of real-life applications to prevent them
> from doing this. If every realistic application has/needs
> CAP_DAC_OVERRIDE, then that's essentially just saying that anyone can
> get writable bpf_link and do anything with it.

I poked around a bit, and looking at the sandboxing configurations
shipped with various daemons in their systemd unit files, it appears
that the main case where daemons are granted CAP_DAC_OVERRIDE is if they
have to be able to read /etc/shadow (which is installed as chmod 0). If
this is really the case, that would indicate it's not a widely needed
capability; but I wouldn't exactly say that I've done a comprehensive
survey, so probably a good idea for you to check your users as well :)

-Toke