Re: [PATCH bpf-next 1/7] bpf: use common jump (instruction) history across all states

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 9 Nov 2023 14:57:33 -0800

On Thu, Nov 9, 2023 at 2:06 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Nov 9, 2023 at 12:39 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Thu, Nov 9, 2023 at 11:49 AM Andrii Nakryiko
> > <andrii.nakryiko@xxxxxxxxx> wrote:
> > >
> > > On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov
> > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > >
> > > > On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko
> > > > <andrii.nakryiko@xxxxxxxxx> wrote:
> > > > >
> > > > >
> > > > > If we ever break DFS property, we can easily change this. Or we can
> > > > > even have a hybrid: as long as traversal preserves DFS property, we
> > > > > use global shared history, but we can also optionally clone and have
> > > > > our own history if necessary. It's a matter of adding optional
> > > > > potentially NULL pointer to "local history". All this is very nicely
> > > > > hidden away from "normal" code.
> > > >
> > > > If we can "easily change this" then let's make it last and optional patch.
> > > > So we can revert in the future when we need to take non-DFS path.
> > >
> > > Ok, sounds good. I'll reorder and put it last, you can decide whether
> > > to apply it or not that way.
> > >
> > > >
> > > > > But again, let's look at data first. I'll get back with numbers soon.
> > > >
> > > > Sure. I think memory increase due to more tracking is ok.
> > > > I suspect it won't cause 2x increase. Likely few %.
> > > > The last time I checked the main memory hog is states stashed for pruning.
> > >
> > > So I'm back with data. See verifier.c changes I did at the bottom,
> > > just to double check I'm not missing something major. I count the
> > > number of allocations (but that's an underestimate that doesn't take
> > > into account realloc), total number of instruction history entries for
> > > entire program verification, and then also peak "depth" of instruction
> > > history. Note that entries should be multiplied by 8 to get the amount
> > > of bytes (and that's not counting per-allocation overhead).
> > >
> > > Here are top 20 results, sorted by number of allocs for Meta-internal,
> > > Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking
> > > and STACK_ZERO optimization. AFTER is with all the patches of this
> > > patch set applied.
> > >
> > > It's a few megabytes of memory allocation, which in itself is probably
> > > not a big deal. But it's just an amount of unnecessary memory
> > > allocations which is basically at least 2x of the total number of
> > > states that we can save. And instead have just a few reallocs to size
> > > global jump history to an order of magnitudes smaller peak entries.
> > >
> > > And if we ever decide to track more stuff similar to
> > > INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or
> > > more memory usage, because the absolute worst case is our global
> > > history will be up to 1 million entries tops. We can track some *code
> > > path dependent* per-instruction information for *each simulated
> > > instruction* easily without having to think twice about this. Which I
> > > think is a nice liberating thought in itself justifying this change.
> > >
> > >
> >
> > Gmail butchered tables. See Github gist ([0]) for it properly formatted.
> >
> >   [0] https://gist.github.com/anakryiko/04c5a3a5ae4ee672bd11d4b7b3d832f5
>
> I think 'peak insn history' is the one to look for, since
> it indicates total peak memory consumption. Right?

Hm... not really? Peak here is the longest sequence of recorded jumps
from root state to any "current". I calculated that to know how big
global history would be necessary.

But it's definitely not a total peak memory consumption, because there
will be states enqueued in a stack still to be processed, and we keep
their jmp_history around. see push_stack() and copy_verifier_state()
we do in that.

> It seems the numbers point out a bug in number collection or
> a bug in implementation.

yeah, but accounting implementation, I suspect. I think I'm not
handling failing states properly.

I'll double check and fix it up, but basically only failing BPF
programs should have bad accounting.

>
> before:
> verifier_loops1.bpf.linked3.o peak=499999
> loop3.bpf.linked3.o peak=111111
>
> which makes sense, since both tests hit 1m insn.
> I can see where 1/2 and 1/9 come from based on asm.
>
> after:
> verifier_loops1.bpf.linked3.o peak=25002
> loop3.bpf.linked3.o peak=333335
>
> So the 1st test got 20 times smaller memory footprint
> while 2nd was 3 times higher.
>
> Both are similar infinite loops.
>
> The 1st one is:
> l1_%=:  r0 += 1;                                        \
>         goto l1_%=;                                     \
>
> My understanding is that there should be all 500k jmps in history with
> or without these patches.
>
> So now I'm more worried about the correctness of the 1st patch.

I'll look closer at what's going on and will report back.