Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 10 Nov 2023 20:29:35 -0800

On Thu, Nov 9, 2023 at 2:57 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Thu, Nov 9, 2023 at 2:06 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Thu, Nov 9, 2023 at 12:39 PM Andrii Nakryiko
> > <andrii.nakryiko@xxxxxxxxx> wrote:
> > >
> > > On Thu, Nov 9, 2023 at 11:49 AM Andrii Nakryiko
> > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov
> > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
> > > > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > > > >
> > > > > >
> > > > > > If we ever break DFS property, we can easily change this. Or we can
> > > > > > even have a hybrid: as long as traversal preserves DFS property, we
> > > > > > use global shared history, but we can also optionally clone and have
> > > > > > our own history if necessary. It's a matter of adding optional
> > > > > > potentially NULL pointer to "local history". All this is very nicely
> > > > > > hidden away from "normal" code.
> > > > >
> > > > > If we can "easily change this" then let's make it last and optional patch.
> > > > > So we can revert in the future when we need to take non-DFS path.
> > > >
> > > > Ok, sounds good. I'll reorder and put it last, you can decide whether
> > > > to apply it or not that way.
> > > >
> > > > >
> > > > > > But again, let's look at data first. I'll get back with numbers soon.
> > > > >
> > > > > Sure. I think memory increase due to more tracking is ok.
> > > > > I suspect it won't cause 2x increase. Likely few %.
> > > > > The last time I checked the main memory hog is states stashed for pruning.
> > > >
> > > > So I'm back with data. See verifier.c changes I did at the bottom,
> > > > just to double check I'm not missing something major. I count the
> > > > number of allocations (but that's an underestimate that doesn't take
> > > > into account realloc), total number of instruction history entries for
> > > > entire program verification, and then also peak "depth" of instruction
> > > > history. Note that entries should be multiplied by 8 to get the amount
> > > > of bytes (and that's not counting per-allocation overhead).
> > > >
> > > > Here are top 20 results, sorted by number of allocs for Meta-internal,
> > > > Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking
> > > > and STACK_ZERO optimization. AFTER is with all the patches of this
> > > > patch set applied.
> > > >
> > > > It's a few megabytes of memory allocation, which in itself is probably
> > > > not a big deal. But it's just an amount of unnecessary memory
> > > > allocations which is basically at least 2x of the total number of
> > > > states that we can save. And instead have just a few reallocs to size
> > > > global jump history to an order of magnitudes smaller peak entries.
> > > >
> > > > And if we ever decide to track more stuff similar to
> > > > INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or
> > > > more memory usage, because the absolute worst case is our global
> > > > history will be up to 1 million entries tops. We can track some *code
> > > > path dependent* per-instruction information for *each simulated
> > > > instruction* easily without having to think twice about this. Which I
> > > > think is a nice liberating thought in itself justifying this change.
> > > >
> > > >
> > >
> > > Gmail butchered tables. See Github gist ([0]) for it properly formatted.
> > >
> > >   [0] https://gist.github.com/anakryiko/04c5a3a5ae4ee672bd11d4b7b3d832f5
> >
> > I think 'peak insn history' is the one to look for, since
> > it indicates total peak memory consumption. Right?
>
> Hm... not really? Peak here is the longest sequence of recorded jumps
> from root state to any "current". I calculated that to know how big
> global history would be necessary.
>
> But it's definitely not a total peak memory consumption, because there
> will be states enqueued in a stack still to be processed, and we keep
> their jmp_history around. see push_stack() and copy_verifier_state()
> we do in that.
>
> > It seems the numbers point out a bug in number collection or
> > a bug in implementation.
>
> yeah, but accounting implementation, I suspect. I think I'm not
> handling failing states properly.
>
> I'll double check and fix it up, but basically only failing BPF
> programs should have bad accounting.

Alexei, your intuition was right! There is indeed a bug in patch #2.
What's funny, it's conceptually the same bug I just fixed in
backtracking logic ([0]). Basically, we cannot rely on checking
instruction indices for equality to make sure it's exactly the same
verified instruction (because it could be the instruction with the
same index, but earlier in verification history). I do also have a bit
of accounting imprecision for last failed state for jmp_total stat,
but I didn't bother fixing it because it's just off by few, and only
for failed validations.

The good news is that this bug actually doesn't affect results at all,
except for that one verifier_loops1.c case (for the same reason why
that bug in [0] wasn't reported earlier, it's a very rare situation
for real-world BPF programs). See updated results in [1].

  [0] https://patchwork.kernel.org/project/netdevbpf/patch/20231110002638.4168352-3-andrii@xxxxxxxxxx/
  [1] https://gist.github.com/anakryiko/4e61d28f1a2caecea4315e50e4346120

Anyways, the fix is pretty straightforward, if not the most elegant.
I'll roll it into patch #2 for next revision (it will be patch #1,
because I moved common history refactoring to be the last one, as
agreed). Still need to add tests and stuff that Eduard requested.

commit f11d05fb037f6a69eff8c7d4eff4c422374af37f (HEAD -> bpf-verif-jmp-history)
Author: Andrii Nakryiko <andrii@xxxxxxxxxx>
Date:   Fri Nov 10 20:12:05 2023 -0800

    [FIX] remember current insn hist entry to reuse for flags

    Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 75d9507a8a9f..42a7619dcf80 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -693,6 +693,7 @@ struct bpf_verifier_env {
         */
        char tmp_str_buf[TMP_STR_BUF_LEN];
        struct bpf_insn_hist_entry *insn_hist;
+       struct bpf_insn_hist_entry *cur_hist_ent;
        u32 insn_hist_cap;
 };

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2878077e0a54..eebda0367dca 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3652,13 +3652,8 @@ static int push_insn_history(struct
bpf_verifier_env *env, struct bpf_verifier_s
        size_t alloc_size;

        /* combine instruction flags if we already recorded this instruction */
-       if (cur->insn_hist_end > cur->insn_hist_start &&
-           (p = &env->insn_hist[cur->insn_hist_end - 1]) &&
-           p->idx == env->insn_idx &&
-           p->prev_idx == env->prev_insn_idx) {
-               WARN_ON_ONCE(p->flags & (INSN_F_STACK_ACCESS |
-                            INSN_F_FRAMENO_MASK | (INSN_F_SPI_MASK <<
INSN_F_SPI_SHIFT)));
-               p->flags |= insn_flags;
+       if (env->cur_hist_ent) {
+               env->cur_hist_ent->flags |= insn_flags;
                return 0;
        }

@@ -3676,6 +3671,7 @@ static int push_insn_history(struct
bpf_verifier_env *env, struct bpf_verifier_s
        p->idx = env->insn_idx;
        p->prev_idx = env->prev_insn_idx;
        p->flags = insn_flags;
+       env->cur_hist_ent = p;
        cur->insn_hist_end++;

        env->jmp_hist_peak = max(env->jmp_hist_peak, cur->insn_hist_end);
@@ -17408,6 +17404,9 @@ static int do_check(struct bpf_verifier_env *env)
                u8 class;
                int err;

+               /* reset current history entry on each new instruction */
+               env->cur_hist_ent = NULL;
+
                env->prev_insn_idx = prev_insn_idx;
                if (env->insn_idx >= insn_cnt) {
                        verbose(env, "invalid insn idx %d insn_cnt %d\n",


>
> >
> > before:
> > verifier_loops1.bpf.linked3.o peak=499999
> > loop3.bpf.linked3.o peak=111111
> >
> > which makes sense, since both tests hit 1m insn.
> > I can see where 1/2 and 1/9 come from based on asm.
> >
> > after:
> > verifier_loops1.bpf.linked3.o peak=25002
> > loop3.bpf.linked3.o peak=333335
> >
> > So the 1st test got 20 times smaller memory footprint
> > while 2nd was 3 times higher.
> >
> > Both are similar infinite loops.
> >
> > The 1st one is:
> > l1_%=:  r0 += 1;                                        \
> >         goto l1_%=;                                     \
> >
> > My understanding is that there should be all 500k jmps in history with
> > or without these patches.
> >
> > So now I'm more worried about the correctness of the 1st patch.
>
> I'll look closer at what's going on and will report back.