On Thu, Nov 9, 2023 at 12:39 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Thu, Nov 9, 2023 at 11:49 AM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Thu, Nov 9, 2023 at 11:29 AM Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > On Thu, Nov 9, 2023 at 9:28 AM Andrii Nakryiko > > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > > > > > > > If we ever break DFS property, we can easily change this. Or we can > > > > even have a hybrid: as long as traversal preserves DFS property, we > > > > use global shared history, but we can also optionally clone and have > > > > our own history if necessary. It's a matter of adding optional > > > > potentially NULL pointer to "local history". All this is very nicely > > > > hidden away from "normal" code. > > > > > > If we can "easily change this" then let's make it last and optional patch. > > > So we can revert in the future when we need to take non-DFS path. > > > > Ok, sounds good. I'll reorder and put it last, you can decide whether > > to apply it or not that way. > > > > > > > > > But again, let's look at data first. I'll get back with numbers soon. > > > > > > Sure. I think memory increase due to more tracking is ok. > > > I suspect it won't cause 2x increase. Likely few %. > > > The last time I checked the main memory hog is states stashed for pruning. > > > > So I'm back with data. See verifier.c changes I did at the bottom, > > just to double check I'm not missing something major. I count the > > number of allocations (but that's an underestimate that doesn't take > > into account realloc), total number of instruction history entries for > > entire program verification, and then also peak "depth" of instruction > > history. Note that entries should be multiplied by 8 to get the amount > > of bytes (and that's not counting per-allocation overhead). > > > > Here are top 20 results, sorted by number of allocs for Meta-internal, > > Cilium, and selftests. BEFORE is without added STACK_ACCESS tracking > > and STACK_ZERO optimization. AFTER is with all the patches of this > > patch set applied. > > > > It's a few megabytes of memory allocation, which in itself is probably > > not a big deal. But it's just an amount of unnecessary memory > > allocations which is basically at least 2x of the total number of > > states that we can save. And instead have just a few reallocs to size > > global jump history to an order of magnitudes smaller peak entries. > > > > And if we ever decide to track more stuff similar to > > INSNS_F_STACK_ACCESS, we won't have to worry about more allocations or > > more memory usage, because the absolute worst case is our global > > history will be up to 1 million entries tops. We can track some *code > > path dependent* per-instruction information for *each simulated > > instruction* easily without having to think twice about this. Which I > > think is a nice liberating thought in itself justifying this change. > > > > > > Gmail butchered tables. See Github gist ([0]) for it properly formatted. > > [0] https://gist.github.com/anakryiko/04c5a3a5ae4ee672bd11d4b7b3d832f5 I think 'peak insn history' is the one to look for, since it indicates total peak memory consumption. Right? It seems the numbers point out a bug in number collection or a bug in implementation. before: verifier_loops1.bpf.linked3.o peak=499999 loop3.bpf.linked3.o peak=111111 which makes sense, since both tests hit 1m insn. I can see where 1/2 and 1/9 come from based on asm. after: verifier_loops1.bpf.linked3.o peak=25002 loop3.bpf.linked3.o peak=333335 So the 1st test got 20 times smaller memory footprint while 2nd was 3 times higher. Both are similar infinite loops. The 1st one is: l1_%=: r0 += 1; \ goto l1_%=; \ My understanding is that there should be all 500k jmps in history with or without these patches. So now I'm more worried about the correctness of the 1st patch.