On Mon, Apr 17, 2023 at 5:48 PM Casey Schaufler <casey@xxxxxxxxxxxxxxxx> wrote: > > On 4/17/2023 4:29 PM, Andrii Nakryiko wrote: > > On Thu, Apr 13, 2023 at 8:11 AM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > >> On Thu, Apr 13, 2023 at 1:16 AM Andrii Nakryiko > >> <andrii.nakryiko@xxxxxxxxx> wrote: > >>> On Wed, Apr 12, 2023 at 7:56 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > >>>> On Wed, Apr 12, 2023 at 9:43 PM Andrii Nakryiko > >>>> <andrii.nakryiko@xxxxxxxxx> wrote: > >>>>> On Wed, Apr 12, 2023 at 12:07 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote: > >>>>>> On Wed, Apr 12, 2023 at 2:28 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > >>>>>>> On Wed, Apr 12, 2023 at 02:06:23PM -0400, Paul Moore wrote: > >>>>>>>> On Wed, Apr 12, 2023 at 1:47 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > >>>>>>>>> On Wed, Apr 12, 2023 at 12:49:06PM -0400, Paul Moore wrote: > >>>>>>>>>> On Wed, Apr 12, 2023 at 12:33 AM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote: > >>>> ... > >>>> > >>>>>>>>> For example, in many places we have things like: > >>>>>>>>> > >>>>>>>>> if (!some_check(...) && !capable(...)) > >>>>>>>>> return -EPERM; > >>>>>>>>> > >>>>>>>>> I would expect this is a similar logic. An operation can succeed if the > >>>>>>>>> access control requirement is met. The mismatch we have through-out the > >>>>>>>>> kernel is that capability checks aren't strictly done by LSM hooks. And > >>>>>>>>> this series conceptually, I think, doesn't violate that -- it's changing > >>>>>>>>> the logic of the capability checks, not the LSM (i.e. there no LSM hooks > >>>>>>>>> yet here). > >>>>>>>> Patch 04/08 creates a new LSM hook, security_bpf_map_create(), which > >>>>>>>> when it returns a positive value "bypasses kernel checks". The patch > >>>>>>>> isn't based on either Linus' tree or the LSM tree, I'm guessing it is > >>>>>>>> based on a eBPF tree, so I can't say with 100% certainty that it is > >>>>>>>> bypassing a capability check, but the description claims that to be > >>>>>>>> the case. > >>>>>>>> > >>>>>>>> Regardless of how you want to spin this, I'm not supportive of a LSM > >>>>>>>> hook which allows a LSM to bypass a capability check. A LSM hook can > >>>>>>>> be used to provide additional access control restrictions beyond a > >>>>>>>> capability check, but a LSM hook should never be allowed to overrule > >>>>>>>> an access denial due to a capability check. > >>>>>>>> > >>>>>>>>> The reason CAP_BPF was created was because there was nothing else that > >>>>>>>>> would be fine-grained enough at the time. > >>>>>>>> The LSM layer predates CAP_BPF, and one could make a very solid > >>>>>>>> argument that one of the reasons LSMs exist is to provide > >>>>>>>> supplementary controls due to capability-based access controls being a > >>>>>>>> poor fit for many modern use cases. > >>>>>>> I generally agree with what you say, but we DO have this code pattern: > >>>>>>> > >>>>>>> if (!some_check(...) && !capable(...)) > >>>>>>> return -EPERM; > >>>>>> I think we need to make this more concrete; we don't have a pattern in > >>>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right? > >>>>>> Simply because there is another kernel access control mechanism which > >>>>>> allows a capability check to be skipped doesn't mean I want to allow a > >>>>>> LSM hook to be used to skip a capability check. > >>>>> This work is an attempt to tighten the security of production systems > >>>>> by allowing to drop too coarse-grained and permissive capabilities > >>>>> (like CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, which inevitable allow more > >>>>> than production use cases are meant to be able to do) and then grant > >>>>> specific BPF operations on specific BPF programs/maps based on custom > >>>>> LSM security policy, which validates application trustworthiness using > >>>>> custom production-specific logic. > >>>> There are ways to leverage the LSMs to apply finer grained access > >>>> control on top of the relatively coarse capabilities that do not > >>>> require circumventing those capability controls. One grants the > >>>> capabilities, just as one would do today, and then leverages the > >>>> security functionality of a LSM to further restrict specific users, > >>>> applications, etc. with a level of granularity beyond that offered by > >>>> the capability controls. > >>> Please help me understand something. What you and Casey are proposing, > >>> when taken to the logical extreme, is to grant to all processes root > >>> permissions and then use LSM to restrict specific actions, do I > >>> understand correctly? This strikes me as a less secure and more > >>> error-prone way of doing things. > >> When taken to the "logical extreme" most concepts end up sounding a > >> bit absurd, but that was the point, wasn't it? > > Wasn't my intent to make it sound absurd, sorry. The way I see it, for > > the sake of example, let's say CAP_BPF allows 20 different operations > > (each with its own security_xxx hook). And let's say in production I > > want to only allow 3 of them. Sure, technically it should be possible > > to deny access at 17 hooks and let it through in just those 3. But if > > someone adds 21st and I forget to add 21st restriction, that would be > > bad (but very probably with such approach). > > That would be a flaw in the implementation of the 21st, not a problem > with the capabilities or LSM model. For the LSM model to be sufficiently > flexible it cannot be required to prevent or detect coding errors. > > > So my point is that for situations like this, dropping CAP_BPF, but > > allowing only 3 hooks to proceed seems a safer approach, because if we > > add 21st hook, it will safely be denied without CAP_BPF *by default*. > > That's what I tried to point out. > > When you're creating security relevant or enforcing mechanisms there has > too be a level of expectation regarding the care with which they're > developed. My expectation is that the 21st hook won't go in without > adequate review. > That's not how it works with BPF LSM, but there is no point in arguing about this. I agree that LSM shouldn't be prevent from adding new hooks just because of some particular LSM implementation. > > But even if we ignore this "safe by default when a new hook is added" > > behavior, when taking user namespaces into account, the restrictive > > LSM approach just doesn't seem to work at all for something like > > CAP_BPF. CAP_BPF cannot be "namespaced", just like, say, CAP_SYS_TIME, > > because we cannot ensure that a given BPF program won't access kernel > > state "belonging" to another process (as one example). > > Time namespaces have been proposed. I would be surprised if there aren't > people working on BPF namespaces somewhere. There's a difference between > "can't" and "haven't been". > It really is "can't" for BPF, as it allows tracing of kernel internals. > > Now, thanks to Jonathan, I get that there was a heated discussion 20 > > years ago about authoritative vs restrictive LSMs. But if I read a > > summary at that time ([0]), authoritative hooks were not out of the > > question *in principle*. Surely, "walk before we can run" makes sense, > > but it's been a while ago. > > Certainly. The SGI comment was mine, by the way. I wanted authoritative > hooks for cases like POSIX ACLs and systems without root. While I would > have liked the decision to go the other way, there's no way I would endorse > a hybrid, where some hooks are restrictive and others authoritative. > Yep, saw your comments as well. Can't say I get what would be wrong with having authoritative hooks together with restrictive ones, but oh well. > > [0] https://lwn.net/2001/1108/a/no-auth-hooks.php3 > > > > > >> Here is a fun story which seems relevant ... in the early days of > >> SELinux, one of the community devs setup up a system with a SELinux > >> policy which restricted all privileged operations from the root user, > >> put the system on a publicly accessible network, posted the root > >> password for all to see, and invited the public to login to the system > >> and attempt to exercise root privilege (it's been well over 10 years > >> at this point so the details are a bit fuzzy). Granted, there were > >> some hiccups in the beginning, mostly due to the crude state of policy > >> development/analysis at the time, but after a few policy revisions the > >> system held up quite well. > > Honest question out of curiosity: was the intent to demonstrate that > > with LSM one can completely restrict root? Or that root was actually > > allowed to do something useful? Because I can see how rejecting > > everything would be rather simple, but actually pretty useless in > > practice. Restricting only part of the power of the root, while still > > allowing it to do something useful in production seems like a much > > harder (but way more valuable) endeavor. Not saying it's impossible, > > but see my example about missing 21st new CAP_BPF functionality. > > Capabilities are sufficient to implement a rootless system. It's been done. > Someone will point out that CAP_SYS_ADMIN is effectively root, and there's > some truth to that. > > >> On the more practical side of things, there are several use cases > >> which require, by way of legal or contractual requirements, that full > >> root/admin privileges are decomposed into separate roles: security > >> admin, audit admin, backup admin, etc. These users satisfy these > >> requirements by using LSMs, such as SELinux, to restrict the > >> administrative capabilities based on the SELinux user/role/domain. > >> > >>> By the way, even the above proposal of yours doesn't work for > >>> production use cases when user namespaces are involved, as far as I > >>> understand. We cannot grant CAP_BPF+CAP_PERFMON+CAP_NET_ADMIN for > >>> containers running inside user namespaces, as CAP_BPF in non-init > >>> namespace is not enough for bpf() syscall to allow loading BPF maps or > >>> BPF program ... > >> Once again, the LSM has always intended to be a restrictive mechanism, > >> not a privilege granting mechanism. If an operation is not possible > > Not according to [0] above: > > > > > It is our belief that these changes do not belong in the initial version of > > > LSM (especially given our limited charter and original goals), and should > > > be proposed as incremental refinements after LSM has been initially > > > accepted. > > > ... > > > It is our belief that the current LSM > > > will provide a meaningful improvement in the security infrastructure of the > > > Linux kernel, and that there is plenty of room for future expansion of LSM > > > in subsequent phases. > > > > I don't see "always intended to be a restrictive mechanism" there. > > Having been on the other side of the argument, the system that was accepted > was in fact "always intended to be a restrictive mechanism". The quote above > is a "never say never" statement. > > >> without the LSM layer enabled, it should not be possible with the LSM > >> layer enabled. The LSM is not a mechanism to circumvent other access > >> control mechanisms in the kernel. > > I understand, but it's not like we are proposing to go and bypass all > > kinds of random kernel security mechanisms. These are targeted hooks, > > developed by the BPF community for the BPF subsystem to allow trusted > > unprivileged production use cases. Yes, we currently rely on checking > > CAP_BPF to grant more dangerous/advanced features, but it's because we > > can't just allow any unprivileged process to do this. But what we > > really want is to answer the question "can we trust this process to > > use this advanced functionality", and if there is no specific LSM > > policy that cares one way (allow) or the other (disallow), fallback to > > CAP_BPF enforcement. > > > > So it's not bypassing kernel checks, but rather augmenting them with > > more flexible and customizable mechanisms, while still falling back to > > CAP_BPF if the user didn't install any custom LSM policy. > > That would make CAP_BPF behave differently from all other capabilities. > Capabilities are hard enough to use correctly as it is. If each capability > defined its own semantics they would be completely unusable. > > >>> Also, in previous email you said: > >>> > >>>> Simply because there is another kernel access control mechanism which > >>>> allows a capability check to be skipped doesn't mean I want to allow a > >>>> LSM hook to be used to skip a capability check. > >>> I understand your stated position, but can you please help me > >>> understand the reasoning behind it? > >> Keeping the LSM as a restrictive access control mechanism helps ensure > >> some level of sanity and consistency across different Linux > >> installations. If a certain operation requires CAP_SYS_ADMIN on one > >> Linux system, it should require CAP_SYS_ADMIN on another Linux system. > >> Granted, a LSM running on one system might impose additional > >> constraints on that operation, but the CAP_SYS_ADMIN requirement still > >> applies. > >> > >> There is also an issue of safety in knowing that enabling a LSM will > >> not degrade the access controls on a system by potentially granting > >> operations that were previously denied. > >> > >>> Does the above also mean that you'd be fine if we just don't plug into > >>> the LSM subsystem at all and instead come up with some ad-hoc solution > >>> to allow effectively the same policies? This sounds detrimental both > >>> to LSM and BPF subsystems, so I hope we can talk this through before > >>> finalizing decisions. > >> Based on your patches and our discussion, it seems to me that the > >> problem you are trying to resolve is related more to the > >> capability-based access controls in the eBPF, and possibly other > >> kernel subsystems, and not any LSM-based restrictions. I'm happy to > >> work with you on a solution involving the LSM, but please understand > >> that I'm not going to support a solution which changes a core > >> philosophy of the LSM layer. > > Great, I'd really appreciate help and suggestions on how to solve the > > following problem. > > > > We have a BPF subsystem that allows loading BPF programs. Those BPF > > programs cannot be contained within a particular namespace just by its > > system-wide tracing nature (it can safely read kernel and user memory > > and we can't restrict whether that memory belongs to a particular > > namespace), so it's like CAP_SYS_TIME, just with much broader API > > surface. > > This doesn't sound like a problem, it sounds like BPF is explicitly > designed to prevent interference by namespaces. But in some cases you > now want to limit it by namespaces. > > It appears that the desired uses of BPF are no longer compatible with > its original security model. That's unfortunate, and likely to require > a significant change to the implementation of BPF. > I have some new ideas, so hopefully not as significant. While I still think that authoritative LSM hooks would be great, I'll stop arguing. I'll get back with a different proposal that would allow BPF usage within user namespaces. We still will want LSM hooks for fine-grained control, but I think we'll be able to make them restrictive-only. > > > > The other piece of a puzzle is user namespaces. We do want to run > > applications inside user namespaces, but allow them to use BPF > > programs. As far as I can tell, there is no way to grant real CAP_BPF > > that will be recognized by capable(CAP_BPF) (not ns_capable, see above > > about system-wide nature of BPF). If there is, please help me > > understand how. All my local experiments failed, and looking at > > cap_capable() implementation it is not intended to even check the > > initial namespace's capability if the process is running in the user > > namespace. > > > > > > So, given that a) we can't make CAP_BPF namespace-aware and b) we > > can't grant real CAP_BPF to processes in user namespace, how could we > > allow user namespaced applications to do useful work with BPF? > > > >>> Lastly, you mentioned before: > >>> > >>>>>> I think we need to make this more concrete; we don't have a pattern in > >>>>>> the upstream kernel where 'some_check(...)' is a LSM hook, right? > >>> Unfortunately I don't have enough familiarity with all LSM hooks, so I > >>> can't confirm or disprove the above statement. But earlier someone > >>> brought to my attention the case of security_vm_enough_memory_mm(), > >>> which seems to be granting effectively CAP_SYS_ADMIN for the purposes > >>> of memory accounting. Am I missing something subtle there or does it > >>> grant effective caps indeed? > >> Some of the comments around that hook can be misleading, but if you > >> look at the actual code it starts to make more sense. > >> > > [...] > > > >> I do agree that the security_vm_enough_memory() hook is structured a > >> bit differently than most of the other LSM hooks, but it still > >> operates with the same philosophy: a LSM should only be allowed to > >> restrict access, a LSM should never be allowed to grant access that > >> would otherwise be denied by the traditional Linux access controls. > >> > >> Hopefully that explanation makes sense, but if things are still a bit > >> fuzzy I would encourage you to go look at the code, I'm sure it will > >> make sense once you spend a few minutes figuring out how it works. > >> > > Yep, thanks a lot, it's way more clear after grokking relevant pieces > > of LSM the code you pointed out and LSM infrastructure in general. > > "capabilities" LSM is non-negotiable, so it effectively always > > restricts a small subset of hooks, including vm_enough_memory and > > capable. > > > > Still, the problem still stands. How do we marry BPF and user > > namespaces? I'd really appreciate suggestions. Thank you! > > > > > >> [1] There is a long and sorta bizarre history with the capability LSM, > >> but just understand it is a bit "special" in many ways, and those > >> "special" behaviors are intentional. > >> > >> -- > >> paul-moore.com