Re: [PATCH bpf-next 0/8] New BPF map and BTF security LSM hooks

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 17 Apr 2023 16:31:14 -0700

On Thu, Apr 13, 2023 at 9:27 AM Casey Schaufler <casey@xxxxxxxxxxxxxxxx> wrote:
>
> On 4/12/2023 6:43 PM, Andrii Nakryiko wrote:
> > On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> >> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> >>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote:
> >>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> >>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote:
> >>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote:
> >>>>>>> Add new LSM hooks, bpf_map_create_security and bpf_btf_load_security, which
> >>>>>>> are meant to allow highly-granular LSM-based control over the usage of BPF
> >>>>>>> subsytem. Specifically, to control the creation of BPF maps and BTF data
> >>>>>>> objects, which are fundamental building blocks of any modern BPF application.
> >>>>>>>
> >>>>>>> These new hooks are able to override default kernel-side CAP_BPF-based (and
> >>>>>>> sometimes CAP_NET_ADMIN-based) permission checks. It is now possible to
> >>>>>>> implement LSM policies that could granularly enforce more restrictions on
> >>>>>>> a per-BPF map basis (beyond checking coarse CAP_BPF/CAP_NET_ADMIN
> >>>>>>> capabilities), but also, importantly, allow to *bypass kernel-side
> >>>>>>> enforcement* of CAP_BPF/CAP_NET_ADMIN checks for trusted applications and use
> >>>>>>> cases.
> >>>>>> One of the hallmarks of the LSM has always been that it is
> >>>>>> non-authoritative: it cannot unilaterally grant access, it can only
> >>>>>> restrict what would have been otherwise permitted on a traditional
> >>>>>> Linux system.  Put another way, a LSM should not undermine the Linux
> >>>>>> discretionary access controls, e.g. capabilities.
> >>>>>>
> >>>>>> If there is a problem with the eBPF capability-based access controls,
> >>>>>> that problem needs to be addressed in how the core eBPF code
> >>>>>> implements its capability checks, not by modifying the LSM mechanism
> >>>>>> to bypass these checks.
> >>>>> I think semantics matter here. I wouldn't view this as _bypassing_
> >>>>> capability enforcement: it's just more fine-grained access control.
> > Exactly. One of the motivations for this work was the need to move
> > some production use cases that are only needing extra privileges so
> > that they can use BPF into a more restrictive environment. Granting
> > CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN to all such use cases that need them
> > for BPF usage is too coarse grained. These caps would allow those
> > applications way more than just BPF usage. So the idea here is more
> > finer-grained control of BPF-specific operations, granting *effective*
> > CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN caps dynamically based on custom
> > production logic that would validate the use case.
>
> That's an authoritative model which is in direct conflict with the
> design and implementation of both capabilities and LSM.
>
> >
> > This *is* an attempt to achieve a more secure production approach.
> >
> >>>>> For example, in many places we have things like:
> >>>>>
> >>>>>         if (!some_check(...) && !capable(...))
> >>>>>                 return -EPERM;
> >>>>>
> >>>>> I would expect this is a similar logic. An operation can succeed if the
> >>>>> access control requirement is met. The mismatch we have through-out the
> >>>>> kernel is that capability checks aren't strictly done by LSM hooks. And
> >>>>> this series conceptually, I think, doesn't violate that -- it's changing
> >>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks
> >>>>> yet here).
> >>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which
> >>>> when it returns a positive value "bypasses kernel checks".  The patch
> >>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is
> >>>> based on a eBPF tree, so I can't say with 100% certainty that it is
> >>>> bypassing a capability check, but the description claims that to be
> >>>> the case.
> >>>>
> >>>> Regardless of how you want to spin this, I'm not supportive of a LSM
> >>>> hook which allows a LSM to bypass a capability check.  A LSM hook can
> >>>> be used to provide additional access control restrictions beyond a
> >>>> capability check, but a LSM hook should never be allowed to overrule
> >>>> an access denial due to a capability check.
> >>>>
> >>>>> The reason CAP_BPF was created was because there was nothing else that
> >>>>> would be fine-grained enough at the time.
> >>>> The LSM layer predates CAP_BPF, and one could make a very solid
> >>>> argument that one of the reasons LSMs exist is to provide
> >>>> supplementary controls due to capability-based access controls being a
> >>>> poor fit for many modern use cases.
> >>> I generally agree with what you say, but we DO have this code pattern:
> >>>
> >>>          if (!some_check(...) && !capable(...))
> >>>                  return -EPERM;
> >> I think we need to make this more concrete; we don't have a pattern in
> >> the upstream kernel where 'some_check(...)' is a LSM hook, right?
> >> Simply because there is another kernel access control mechanism which
> >> allows a capability check to be skipped doesn't mean I want to allow a
> >> LSM hook to be used to skip a capability check.
> > This work is an attempt to tighten the security of production systems
> > by allowing to drop too coarse-grained and permissive capabilities
> > (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more
> > than production use cases are meant to be able to do)
>
> The BPF developers are in complete control of what CAP_BPF controls.
> You can easily address the granularity issue by adding addition restrictions
> on processes that have CAP_BPF. That is the intended use of LSM.
> The whole point of having multiple capabilities is so that you can
> grant just those that are required by the system security policy, and
> do so safely. That leads to differences of opinion regarding the definition
> of the system security policy. BPF chose to set itself up as an element
> of security policy (you need CAP_BPF) rather than define elements such that
> existing capabilities (CAP_FOWNER, CAP_KILL, CAP_MAC_OVERRIDE, ...) would
> control.

Please see my reply to Paul, where I explain CAP_BPF's system-wide
nature and problem with user namespaces. I don't think the problem is
in the granularity of CAP_BPF, it's more of a "non-namespaceable"
nature of the BPF subsystem in general.

>
> >  and then grant
> > specific BPF operations on specific BPF programs/maps based on custom
> > LSM security policy,
>
> This is backwards. The correct implementation is to require CAP_BPF and
> further restrict BPF operations based on a custom LSM security policy.
> That's how LSM is designed.

Please see my reply to Paul, we can't grant real CAP_BPF for
applications in user namespace (unless there is some trick that I
don't know, so please do point it out). Let's converge the discussion
in that email thread branch to not discuss the same topic multiple
times.

>
> >  which validates application trustworthiness using
> > custom production-specific logic.
> >
> > Isn't this goal in line with LSMs mission to enhance system security?
>
> We're not arguing the goal, we're discussing the implementation.
>
> >>> It looks to me like this series can be refactored to do the same. I
> >>> wouldn't consider that to be a "bypass", but I would agree the current
> >>> series looks too much like "bypass", and makes reasoning about the
> >>> effect of the LSM hooks too "special". :)
> > Sorry, I didn't realize that the current code layout is making things
> > more confusing. I'll address feedback to make the intent a bit
> > clearer.
> >
> >> --
> >> paul-moore.com