Re: [PATCH v13 3/5] security: Replace indirect LSM hook calls with static calls

KP Singh <kpsingh@xxxxxxxxxx> · Thu, 4 Jul 2024 00:22:43 +0200

On Wed, Jul 3, 2024 at 10:56 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
>
> On Wed, Jul 3, 2024 at 12:55 PM KP Singh <kpsingh@xxxxxxxxxx> wrote:
> > On Wed, Jul 3, 2024 at 2:07 AM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> > >
> > > On Jun 29, 2024 KP Singh <kpsingh@xxxxxxxxxx> wrote:
> > > >
> > > > LSM hooks are currently invoked from a linked list as indirect calls
> > > > which are invoked using retpolines as a mitigation for speculative
> > > > attacks (Branch History / Target injection) and add extra overhead which
> > > > is especially bad in kernel hot paths:
> >
> > [...]
> >
> > > should fix the more obvious problems.  I'd like to know if you are
> > > aware of any others?  If not, the text above should be adjusted and
> > > we should reconsider patch 5/5.  If there are other problems I'd
> > > like to better understand them as there may be an independent
> > > solution for those particular problems.
> >
> > We did have problems with some other hooks but I was unable to dig up
> > specific examples though, it's been a while. More broadly speaking, a
> > default decision is still a decision. Whereas the intent from the BPF
> > LSM is not to make a default decision unless a BPF program is loaded.
> > I am quite worried about the holes this leaves open, subtle bugs
> > (security or crashes) we have not caught yet and PATCH 5/5 engineers away
> >  the problem of the "default decision".
>
> The inode/xattr problem you originally mentioned wasn't really rooted
> in a "bad" default return value, it was really an issue with how the
> LSM hook was structured due to some legacy design assumptions made
> well before the initial stacking patches were merged.  That should be
> fixed now[1] and given that the inode/xattr set/remove hooks were
> unique in this regard (individual LSMs were responsible for performing
> the capabilities checks) I don't expect this to be a general problem.
>
> There were also some issues caused by the fact that we were defining
> the default return value in multiple places and these values had gone
> out of sync in a number of hooks.  We've also fixed this problem by
> only defining the default return value once for each hook, solving all
> of those problems.

I don't see how this solves problems or prevents any future problems
with side-effects. I have always been uncomfortable with an extraneous
function being called with a side effect ever since we merged BPF LSM
with default callback. We have found one bug due to this, not all the
bugs.

>
> I'm not aware of any other existing problems relating to the LSM hook
> default values, if there are any, we need to fix them independent of
> this patchset.  The LSM framework should function properly if the
> "default" values are used.

Patch 5 eliminates the possibilities of errors and subtle bugs all
together. The problem with subtle bugs is, well, they are subtle, if
you and I knew of the bugs, we would fix all of them, but we don't. I
really feel we ought to eliminate the class of issues and not just
whack-a-mole when we see the bugs.

The LSM framework ought to function with default values is a nice
goal, but a weaker position than what we have where we make these
subtle bugs impossible. If you feel this should not be a part of the
framework, I would still like to revert back to implementation where
we kept the toggling logic contained to the BPF LSM.

- KP

>
> [1] I just realized that commit 61df7b828204 doesn't properly update
> the removexattr implementations for SELinux and Smack, expect a patch
> for that soon.  The current code is okay, it just does some
> unnecessary work (see the setxattr changes to get an idea of the
> changes needed).
>
> > > > +#define lsm_for_each_hook(scall, NAME)                                       \
> > > > +     for (scall = static_calls_table.NAME;                           \
> > > > +          scall - static_calls_table.NAME < MAX_LSM_COUNT; scall++)  \
> > > > +             if (static_key_enabled(&scall->active->key))
> > >
> > > This is probably a stupid question, but why use static_key_enabled()
> > > here instead of static_branch_unlikely() as in the call_XXX_macros?
> >
> > The static_key_enabled is a check for the key being enabled, whereas
> > the static_branch_likely is for laying out the right assembly code
> > (jump tables etc.) and based on the value of the static key, here we
> > are not using the static calls or even keys, rather we are following
> > back from direct calls to indirect calls and don't really need any
> > jump tables in the slow path.
>
> Gotcha, thanks for the explanation.
>
> --
> paul-moore.com