Re: [PATCH v13 3/5] security: Replace indirect LSM hook calls with static calls

KP Singh <kpsingh@xxxxxxxxxx> · Thu, 4 Jul 2024 02:24:05 +0200

On Thu, Jul 4, 2024 at 2:04 AM Casey Schaufler <casey@xxxxxxxxxxxxxxxx> wrote:
>
> On 7/3/2024 4:08 PM, KP Singh wrote:
> > On Thu, Jul 4, 2024 at 12:52 AM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> >> On Wed, Jul 3, 2024 at 6:22 PM KP Singh <kpsingh@xxxxxxxxxx> wrote:
> >>> On Wed, Jul 3, 2024 at 10:56 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> >>>> On Wed, Jul 3, 2024 at 12:55 PM KP Singh <kpsingh@xxxxxxxxxx> wrote:
> >>>>> On Wed, Jul 3, 2024 at 2:07 AM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> >>>>>> On Jun 29, 2024 KP Singh <kpsingh@xxxxxxxxxx> wrote:
> >>>>>>> LSM hooks are currently invoked from a linked list as indirect calls
> >>>>>>> which are invoked using retpolines as a mitigation for speculative
> >>>>>>> attacks (Branch History / Target injection) and add extra overhead which
> >>>>>>> is especially bad in kernel hot paths:
> >>>>> [...]
> >>>>>
> >>>>>> should fix the more obvious problems.  I'd like to know if you are
> >>>>>> aware of any others?  If not, the text above should be adjusted and
> >>>>>> we should reconsider patch 5/5.  If there are other problems I'd
> >>>>>> like to better understand them as there may be an independent
> >>>>>> solution for those particular problems.
> >>>>> We did have problems with some other hooks but I was unable to dig up
> >>>>> specific examples though, it's been a while. More broadly speaking, a
> >>>>> default decision is still a decision. Whereas the intent from the BPF
> >>>>> LSM is not to make a default decision unless a BPF program is loaded.
> >>>>> I am quite worried about the holes this leaves open, subtle bugs
> >>>>> (security or crashes) we have not caught yet and PATCH 5/5 engineers away
> >>>>>  the problem of the "default decision".
> >>>> The inode/xattr problem you originally mentioned wasn't really rooted
> >>>> in a "bad" default return value, it was really an issue with how the
> >>>> LSM hook was structured due to some legacy design assumptions made
> >>>> well before the initial stacking patches were merged.  That should be
> >>>> fixed now[1] and given that the inode/xattr set/remove hooks were
> >>>> unique in this regard (individual LSMs were responsible for performing
> >>>> the capabilities checks) I don't expect this to be a general problem.
> >>>>
> >>>> There were also some issues caused by the fact that we were defining
> >>>> the default return value in multiple places and these values had gone
> >>>> out of sync in a number of hooks.  We've also fixed this problem by
> >>>> only defining the default return value once for each hook, solving all
> >>>> of those problems.
> >>> I don't see how this solves problems or prevents any future problems
> >>> with side-effects. I have always been uncomfortable with an extraneous
> >>> function being called with a side effect ever since we merged BPF LSM
> >>> with default callback. We have found one bug due to this, not all the
> >>> bugs.
> >> You've got to give me something more concrete than that.  If you can't
> >> provide any concrete examples, start with providing a basic concept
> >> with far more detail than just "side-effects".
> >>
> >>>> I'm not aware of any other existing problems relating to the LSM hook
> >>>> default values, if there are any, we need to fix them independent of
> >>>> this patchset.  The LSM framework should function properly if the
> >>>> "default" values are used.
> >>> Patch 5 eliminates the possibilities of errors and subtle bugs all
> >>> together. The problem with subtle bugs is, well, they are subtle, if
> >>> you and I knew of the bugs, we would fix all of them, but we don't. I
> >>> really feel we ought to eliminate the class of issues and not just
> >>> whack-a-mole when we see the bugs.
> >> Here's the thing, I don't really like patch 5/5.  To be honest, I
> >> don't really like a lot of this patchset.  From my perspective, the
> >> complexity of the code is likely going to mean more maintenance
> >> headaches down the road, but Linus hath spoken so we're doing this
> >> (although "this" is still a bit undefined as far as I'm concerned).
> >> If you want me to merge patch 5/5 you've got to give me something real
> >> and convincing that can't be fixed by any other means.  My current
> >> opinion is that you're trying to use a previously fixed bug to scare
> >> and/or coerce the merging of some changes I don't really want to
> >> merge.  If you want me to take patch 5/5, you've got to give me a
> >> reason that is far more compelling that what you've written thus far.
> > Paul, I am not scaring you, I am providing a solution that saves us
> > from headaches with side-effects and bugs in the future. It's safer by
> > design.
> >
> > You say you have not reviewed it carefully, but you did ask me to move
> > the function from the BPF LSM layer to an LSM API, and we had a bunch
> > of discussion around naming in the subsequent revisions.
> >
> > https://lore.kernel.org/bpf/f7e8a16b0815d9d901e019934d684c5f@xxxxxxxxxxxxxx/
> >
> > My reasons are:
> >
> > 1. It's safer, no side effects, guaranteed to be not buggy. Neither
> > you, nor me, can guarantee that a default value will be safe in the
> > LSM layer. I request others (Casey, Kees) for their opinion here too.
> > 2. Performance, no extra function call.
>
> I want to be very careful about the comments I make on this patch set.
> I can't say that I trust any fix for the BPF LSM layer. My natural
> inclination is to isolate the fix to the area that has the problem,
> that is, BPF. I have a hard time accepting the notion that the implementation
> will really fix all possible bugs in the future. The pace at which eBPF
> is evolving gives me the heebee geebees when I think of it as a mechanism
> for implementing security modules.
>
> My biggest concern is that we may be trying too hard for perfection.
> I see a situation where we're not moving forward because there are two
> reasonable solutions and rather than running with either we're desperately
> looking for a compelling reason to pick one over the other.

I am fine with either, if you folks prefer security_toggle_hook to be
in BPF only / limited to BPF. I can revert back to what we had in v9,
the changes to the LSM layer are then very minimal.

- KP

>
> >
> > If you still don't like it, it's your call, I would still like to keep
> > most of the logic local to the BPF LSM as proposed in
> > https://lore.kernel.org/bpf/f7e8a16b0815d9d901e019934d684c5f@xxxxxxxxxxxxxx/
> >
> > - KP
> >
> >> --
> >> paul-moore.com