Re: "warning: context imbalance" in kernel/bpf/hashtab.c

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Fri, 6 Nov 2020 11:44:00 -0800

On Fri, Nov 6, 2020 at 11:05 AM Song Liu <songliubraving@xxxxxx> wrote:
>
> Thanks for your suggestion! This pattern didn't trigger the warning.
> We still need some work to apply this to actual code, because the lock
> part is in an helper:
>
> inline int htab_lock_bucket()
> {
>         if (some_condition)
>                 return -EBUSY;
>         spin_lock();
>         return 0;
> }

So inlining does end up being important, because then sparse can see
the pattern in the caller.

But yes, as-is you end up with the same issue that sparse then may not
be able to see the "return value goes together with the spinlock".

That said, in the form you have it now in, it might be easier for
sparse to see that, by just adding a branch following phase.

Branch following is much simpler than value analysis, but "much
simpler" is still not the same as "entirely trivial".

Here's a test-case for this behavior for sparse, in the hope that Luc
goes "that's easy enough to do":

    extern void spin_lock(void);
    extern void spin_unlock(void);

    static inline int errno(int arg)
    {
        if (arg)
                return -1;
        spin_lock();
        return 0;
    }

    int cmp(int i)
    {
        if (errno(i))
                return -1;
        spin_unlock();
        return 0;
    }

and right now sparse linearizes this to

    cmp:
    .L0:
        <entry-point>
        select.32   %r4 <- %arg1, $0xffffffff, $0
        cbr         %arg1, .L5, .L4

    .L4:
        call        spin_lock
        br          .L5

    .L5:
        # call      %r4 <- errno, %r1
        select.32   %r6 <- %r4, $0xffffffff, $0
        cbr         %arg1, .L6, .L2

    .L2:
        call        spin_unlock
        br          .L6

    .L6:
        ret.32      %r6

because sparse isn't smart enough to do a couple of simple optimizations:

 (a) the conditional following for the return value:

        select.32   %r4 <- %arg1, $0xffffffff, $0
        select.32   %r6 <- %r4, $0xffffffff, $0

Note how it doesn't trigger CSE, because it's not the same instruction
(%arg1 vs %r4), but sparse *could* see that a select that depends on a
previous select that has constant arguments can be simplified to be
conditional on the original select value instead, so it *could* be
CSE'd into one single thing.

So the second "select" could be optimized away by realizing that it
really gets the same value as the first one. That *should* be trivial
to do in sparse by just having a simple pattern for select
simplification.

And once that optimization is in place, the second optimization would be

 (b) trivial branch following: "conditional branch to conditional
branch with the same condition".

ie we'd have the pattern of

        cbr         %arg1, .L5, .L4
        ...
    .L5:
        cbr         %arg1, .L6, .L2

and now a trivial branch following optimization would see "oh, the
first target of the first cbr is to another cbr that has the same
condition, so we can rewrite the first cbr as

        cbr         %arg1, .L6, .L4

instead.

And once you've done that trivial branch following, the .L5 target has
only that fallthrough source from .L4, and the code becomes

    cmp:
    .L0:
        <entry-point>
        select.32   %r4 <- %arg1, $0xffffffff, $0
        cbr         %arg1, .L6, .L4

    .L4:
        call        spin_lock
        cbr         %arg1, .L6, .L2

    .L2:
        call        spin_unlock
        br          .L6

    .L6:
        ret.32      %r4

which still isn't good enough, because it still has that conditional
branch that is now pointless.

Removing the second conditional branch entirely would require having a
"look, it's now dominated by a previous conditional branch on the same
condition, so it can't trigger".

That's the only remaining optimization that isn't purely local.
Dominance analysis is always painful. sparse does do it (for memory
accesses), but it's a *lot* more complex than just some local
single-instruction pattern that looks at the source or the target of
that single instruction.

Anyway - this is all doable, but it's the end of the week so my pull
requests for the kernel are starting to pile up and I already spent
more time than I should have at looking at sparse.

Maybe this gives Luc some ideas, though.

Or maybe Luc will point out that even my "simple" optimizations are
slightly more painful than I think they are for some reason that I
haven't looked at.

              Linus