Hello, I am working on Parca, a continuous CPU time profiler which works by doing DWARF-based stack unwinding in an eBPF program. In 6.8 series kernels, our unwinder can no longer be loaded, as it exceeds the verifier's limit of 1M instructions processed. I was eventually able to get it working, so this is not exactly a support request, but rather a report of an issue that might affect other people. The issue is 100% reproducible and bisects to 67420501e8681ae18f9f0ea0a69cd2f432100e70. Hacking the kernel a bit to increase the instructions processed limit, I found that it successfully verifies after ~3M instructions, whereas previously it took ~250k, so more than a 10x regression. In case it matters, I'm on arm64. I am not an expert on the internal details of the verifier, but I have a vague theory on what can be causing this behavior. Our unwinder has a few places where it needs to do binary search on an array, which look similar to this: #define MAX_UNWIND_INFO_BINARY_SEARCH_DEPTH 19 #define MAX_UNWIND_TABLE_SIZE 250 * 1000 static u64 find_offset_for_pc(stack_unwind_table_t *table, u64 pc, u64 left, u64 right) { u64 found = BINARY_SEARCH_DEFAULT; for (int i = 0; i < MAX_UNWIND_INFO_BINARY_SEARCH_DEPTH; i++) { if (left >= right) { LOG("\t.done"); return found; } u32 mid = (left + right) / 2; // Appease the verifier. if (mid < 0 || mid >= MAX_UNWIND_TABLE_SIZE) { bump_unwind_error_should_never_happen(); return BINARY_SEARCH_SHOULD_NEVER_HAPPEN; } if (table->rows[mid].pc <= pc) { found = mid; left = mid + 1; } else { right = mid; } } return BINARY_SEARCH_EXHAUSTED_ITERATIONS; } (For more examples see https://github.com/parca-dev/parca-agent/blob/main/bpf/unwinders/native.bpf.c .) Note that as I said, I'm not an expert on the verifier, so my theory could be totally wrong. Please assume I put qualifiers like "I think...", "I suspect...", etc. in the proper places in what follows. The commit in question made the verifier track the bounds of variables more precisely, which might allow it to verify more programs. This, however, has a downside which is that it causes an explosion in the state space. In previous versions of the kernel, we could just verify this snippet once because the verifier did not precisely track the bounds of `mid`. In the new version, we have different bounds for `mid` in the different branches of the main `if` statement, and the set of different bounds multiplies in size by 2 each time through the loop, causing this huge explosion. Now, if I intentionally confuse the verifier and prevent it from knowing the bounds of `mid`, e.g.: static u32 __attribute__((optnone)) scramble(u32 val) { return val ^ 0xFFFFFFFF; } then, in the main binary search loop: u32 mid = (left + right) / 2; mid = scramble(scramble(mid)); then the program verifies fine, as it did before the commit in question. I am not sure what, if anything, should be done about this in the kernel, as it seems there's a fundamental tradeoff between precision of the verifier and performance in terms of instructions processed. -- Brennan (he/him) Staff Software Engineer | Polar Signals Inc. E: brennan.vincent@xxxxxxxxxxxxxxxx W: umanwizard.com TZ: America/New_York