10x regression in processed instructions after recent verifier change

"Brennan Vincent" <brennan.vincent@xxxxxxxxxxxxxxxx> · Wed, 03 Apr 2024 14:59:00 -0400

Hello,

I am working on Parca, a continuous CPU time profiler which works by
doing DWARF-based stack unwinding in an eBPF program. In 6.8 series
kernels, our unwinder can no longer be loaded, as it exceeds the
verifier's limit of 1M instructions processed.

I was eventually able to get it working, so this is not exactly a
support request, but rather a report of an issue that might affect other
people.

The issue is 100% reproducible and bisects to
67420501e8681ae18f9f0ea0a69cd2f432100e70. Hacking the kernel a bit to
increase the instructions processed limit, I found that it successfully
verifies after ~3M instructions, whereas previously it took ~250k, so
more than a 10x regression.

In case it matters, I'm on arm64.

I am not an expert on the internal details of the verifier, but I have a
vague theory on what can be causing this behavior. Our unwinder has a
few places where it needs to do binary search on an array, which look
similar to this:

#define MAX_UNWIND_INFO_BINARY_SEARCH_DEPTH 19
#define MAX_UNWIND_TABLE_SIZE 250 * 1000

static u64 find_offset_for_pc(stack_unwind_table_t *table, u64 pc, u64 left, u64 right) {
    u64 found = BINARY_SEARCH_DEFAULT;

    for (int i = 0; i < MAX_UNWIND_INFO_BINARY_SEARCH_DEPTH; i++) {
        if (left >= right) {
            LOG("\t.done");
            return found;
        }

        u32 mid = (left + right) / 2;

        // Appease the verifier.
        if (mid < 0 || mid >= MAX_UNWIND_TABLE_SIZE) {
            bump_unwind_error_should_never_happen();
            return BINARY_SEARCH_SHOULD_NEVER_HAPPEN;
        }

        if (table->rows[mid].pc <= pc) {
            found = mid;
            left = mid + 1;
        } else {
            right = mid;
        }
    }
    return BINARY_SEARCH_EXHAUSTED_ITERATIONS;
}

(For more examples see
https://github.com/parca-dev/parca-agent/blob/main/bpf/unwinders/native.bpf.c
.)

Note that as I said, I'm not an expert on the verifier, so my theory
could be totally wrong. Please assume I put qualifiers like "I
think...", "I suspect...", etc. in the proper places in what
follows.

The commit in question made the verifier track the bounds of variables
more precisely, which might allow it to verify more programs. This,
however, has a downside which is that it causes an explosion in the
state space. In previous versions of the kernel, we could just verify
this snippet once because the verifier did not precisely track the
bounds of `mid`. In the new version, we have different bounds for `mid`
in the different branches of the main `if` statement, and the set of
different bounds multiplies in size by 2 each time through the loop,
causing this huge explosion.

Now, if I intentionally confuse the verifier and prevent it from knowing
the bounds of `mid`, e.g.:

static u32 __attribute__((optnone)) scramble(u32 val) {
        return val ^ 0xFFFFFFFF;
}

then, in the main binary search loop:

        u32 mid = (left + right) / 2;
        mid = scramble(scramble(mid));

then the program verifies fine, as it did before the commit in question.

I am not sure what, if anything, should be done about this in the kernel,
as it seems there's a fundamental tradeoff between precision of the
verifier and performance in terms of instructions processed.

-- 
Brennan (he/him)
Staff Software Engineer | Polar Signals Inc.
E: brennan.vincent@xxxxxxxxxxxxxxxx
W: umanwizard.com
TZ: America/New_York