Re: [PATCH v5 bpf-next 00/23] BPF register bounds logic and testing improvements

Harishankar Vishwanathan <harishankar.vishwanathan@xxxxxxxxxxx> · Tue, 7 Nov 2023 06:37:46 +0000

On Wed, Nov 1, 2023 1:13 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
> On Wed, Nov 1, 2023 at 5:37 AM Paul Chaignon <paul.chaignon@xxxxxxxxx> wrote:
> >
> > On Mon, Oct 30, 2023 at 10:19:01PM -0700, Andrii Nakryiko wrote:
> > > On Mon, Oct 30, 2023 at 10:55 AM Alexei Starovoitov
> > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > >
> > > > On Fri, Oct 27, 2023 at 11:13:23AM -0700, Andrii Nakryiko wrote:
> > > > >
> > > > > Note, this is not unique to <range> vs <range> logic. Just recently ([0])
> > > > > a related issue was reported for existing verifier logic. This patch set does
> > > > > fix that issues as well, as pointed out on the mailing list.
> > > > >
> > > > >   [0] https://lore.kernel.org/bpf/CAEf4Bzbgf-WQSCz8D4Omh3zFdS4oWS6XELnE7VeoUWgKf3cpig@xxxxxxxxxxxxxx/
> > > >
> > > > Quick comment regarding shift out of bound issue.
> > > > I think this patch set makes Hao Sun's repro not working, but I don't think
> > > > the range vs range improvement fixes the underlying issue.
> > >
> > > Correct, yes, I think adjust_reg_min_max_vals() might still need some fixing.
> > >
> > > > Currently we do:
> > > > if (umax_val >= insn_bitness)
> > > >   mark_reg_unknown
> > > > else
> > > >   here were use src_reg->u32_max_value or src_reg->umax_value
> > > > I suspect the insn_bitness check is buggy and it's still possible to hit UBSAN splat with
> > > > out of bounds shift. Just need to try harder.
> > > > if w8 < 0xffffffff goto +2;
> > > > if r8 != r6 goto +1;
> > > > w0 >>= w8;
> > > > won't be enough anymore.
> > >
> > > Agreed, but I felt that fixing adjust_reg_min_max_vals() is out of
> > > scope for this already large patch set. If someone can take a deeper
> > > look into reg bounds for arithmetic operations, it would be great.
> > >
> > > On the other hand, one of those academic papers claimed to verify
> > > soundness of verifier's reg bounds, so I wonder why they missed this?
> >
> > AFAICS, it should have been able to detect this bug. Equation (3) from
> > [1, page 10] encodes the soundness condition for conditional jumps and
> > the implementation definitely covers BPF_JEQ/JNE and the logic in
> > check_cond_jmp_op. So either there's a bug in the implementation or I'm
> > missing something about how it works. Let me cc two of the paper's
> > authors :)
> >
> > Hari, Srinivas: Hao Sun recently discovered a bug in the range analysis
> > logic of the verifier, when comparing two unknown scalars with
> > non-overlapping ranges. See [2] for Eduard Zingerman's explanation. It
> > seems to have existed for a while. Any idea why Agni didn't uncover it?
> >
> > 1 - https://harishankarv.github.io/assets/files/agni-cav23.pdf
> > 2 - https://lore.kernel.org/bpf/8731196c9a847ff35073a2034662d3306cea805f.camel@xxxxxxxxx/
> >
> > > cc Paul, maybe he can clarify (and also, Paul, please try to run all
> > > that formal verification machinery against this patch set, thanks!)
> >

Thanks Paul for bringing this to our notice, and for the valuable clarifications
you provided. The bug discovered by Hao Sun occurs only during verificaiton,
when the verifier follows what is essentially dead code. An execution of the
example eBPF program cannot manifest a mismatch between the verifier's beliefs
about the values in registers and the actual values during execution. As such,
the example eBPF program cannot be used to achieve an actual verifier bypass.

As pointed out by Eduard Zingerman in the mailing list thread, the issue arises
when the verifier follows the false (similarly true) branch of a
jump-if-not-equal (similarly jump-if-equal) instruction, when it is never
possible that the jump condition is false (similarly true). While it is okay for
the verifier to follow dead code generally, it so happens that the logic it uses
to update the registers ranges does not work in this specific case, and ends up
violating one of the invariants it is supposed to maintain (a <= b for a range
[a, b]).

Agni's verification condition [1] is stricter. It follows the false (similarly
true) branch of a jump-if-not-equal (similarly jump-if-equal) instruction *only*
when it is possible that the registers are equal (similarly not equal). In
essence, Agni discards the reported verifier bug as a false positive.

We can easily weaken Agni's verification condition to detect such bugs. We
modified Agni's verification condition [2] to follow both the branches of a
jump-if-not-equal instruction, regardless of whether it is possible that the
registers can be equal. Indeed, the modified verification condition produced the
umin > umax verifier bug from Hao's example. The example produced by Agni, and
an extended discussion can be found at Agni's issue tracker [3].

[1] https://user-images.githubusercontent.com/8588645/280917882-dc97090d-040a-43b0-9bf8-806081992716.png
[2] https://user-images.githubusercontent.com/8588645/280925756-19336087-836f-45e5-87fb-c2453558df06.png
[3] https://github.com/bpfverif/ebpf-range-analysis-verification-cav23/issues/15#issuecomment-1797858245

> > I tried it yesterday but am running into what looks like a bug in the
> > LLVM IR to SMT conversion. Probably not something I can fix myself
> > quickly so I'll need help from Hari & co.
> >
> > That said, even without your patchset, I'm running into another issue
> > where the formal verification takes several times longer (up to weeks
> > /o\) since v6.4.
> >

I'm looking into this next, thanks for the heads up!

> That's unfortunate. If you figure this out, I'd still be interested in
> doing an extra check. Meanwhile I'm working on doing more sanity
> checks in the kernel (and inevitably having to debug and fix issues,
> still working on this).