Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > On Wed, Apr 3, 2024 at 11:59 AM Brennan Vincent > <brennan.vincent@xxxxxxxxxxxxxxxx> wrote: >> >> Hello, >> >> I am working on Parca, a continuous CPU time profiler which works by >> doing DWARF-based stack unwinding in an eBPF program. In 6.8 series >> kernels, our unwinder can no longer be loaded, as it exceeds the >> verifier's limit of 1M instructions processed. >> >> I was eventually able to get it working, so this is not exactly a >> support request, but rather a report of an issue that might affect other >> people. >> >> The issue is 100% reproducible and bisects to >> 67420501e8681ae18f9f0ea0a69cd2f432100e70. Hacking the kernel a bit to >> increase the instructions processed limit, I found that it successfully >> verifies after ~3M instructions, whereas previously it took ~250k, so >> more than a 10x regression. >> >> In case it matters, I'm on arm64. >> >> I am not an expert on the internal details of the verifier, but I have a >> vague theory on what can be causing this behavior. Our unwinder has a >> few places where it needs to do binary search on an array, which look >> similar to this: >> >> #define MAX_UNWIND_INFO_BINARY_SEARCH_DEPTH 19 >> #define MAX_UNWIND_TABLE_SIZE 250 * 1000 >> >> static u64 find_offset_for_pc(stack_unwind_table_t *table, u64 pc, u64 left, u64 right) { >> u64 found = BINARY_SEARCH_DEFAULT; >> >> for (int i = 0; i < MAX_UNWIND_INFO_BINARY_SEARCH_DEPTH; i++) { >> if (left >= right) { >> LOG("\t.done"); >> return found; >> } >> >> u32 mid = (left + right) / 2; >> >> // Appease the verifier. >> if (mid < 0 || mid >= MAX_UNWIND_TABLE_SIZE) { >> bump_unwind_error_should_never_happen(); >> return BINARY_SEARCH_SHOULD_NEVER_HAPPEN; >> } >> >> if (table->rows[mid].pc <= pc) { >> found = mid; >> left = mid + 1; >> } else { >> right = mid; >> } >> } >> return BINARY_SEARCH_EXHAUSTED_ITERATIONS; >> } >> >> (For more examples see >> https://github.com/parca-dev/parca-agent/blob/main/bpf/unwinders/native.bpf.c >> .) >> >> Note that as I said, I'm not an expert on the verifier, so my theory >> could be totally wrong. Please assume I put qualifiers like "I >> think...", "I suspect...", etc. in the proper places in what >> follows. >> >> The commit in question made the verifier track the bounds of variables >> more precisely, which might allow it to verify more programs. This, >> however, has a downside which is that it causes an explosion in the >> state space. In previous versions of the kernel, we could just verify >> this snippet once because the verifier did not precisely track the >> bounds of `mid`. In the new version, we have different bounds for `mid` >> in the different branches of the main `if` statement, and the set of >> different bounds multiplies in size by 2 each time through the loop, >> causing this huge explosion. >> >> Now, if I intentionally confuse the verifier and prevent it from knowing >> the bounds of `mid`, e.g.: >> >> static u32 __attribute__((optnone)) scramble(u32 val) { >> return val ^ 0xFFFFFFFF; >> } >> >> then, in the main binary search loop: >> >> u32 mid = (left + right) / 2; >> mid = scramble(scramble(mid)); >> >> then the program verifies fine, as it did before the commit in question. > > Yep, I think your analysis is correct, and your obfuscation workaround > is "sane" :) > > Having said that, I think it would benefit your BPF programs from > thinking a bit more carefully about scaling your BPF verification > process by using various features and techniques that keep > verification complexity more bounded. > > Specifically, I think you can benefit from making find_offset_for_pc a > global function, in which case it will be validated just once, > separately, and then not re-validated. > > Also, if you can rely on that, using bpf_for() loops (based on > open-coded iterators) or bpf_loop() helper would also help minimize > verification complexity of your loops. Thanks. I'll look into all those suggestions. >> >> I am not sure what, if anything, should be done about this in the kernel, >> as it seems there's a fundamental tradeoff between precision of the >> verifier and performance in terms of instructions processed. >> >> -- >> Brennan (he/him) >> Staff Software Engineer | Polar Signals Inc. >> E: brennan.vincent@xxxxxxxxxxxxxxxx >> W: umanwizard.com >> TZ: America/New_York -- Brennan (he/him) Staff Software Engineer | Polar Signals Inc. E: brennan.vincent@xxxxxxxxxxxxxxxx W: www.umanwizard.com TZ: America/New_York