Re: [PATCH v5 bpf-next 00/23] BPF register bounds logic and testing improvements

Paul Chaignon <paul.chaignon@xxxxxxxxx> · Tue, 7 Nov 2023 17:38:01 +0100

On Tue, Nov 07, 2023 at 06:37:46AM +0000, Harishankar Vishwanathan wrote:
> On Wed, Nov 1, 2023 1:13 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> > On Wed, Nov 1, 2023 at 5:37 AM Paul Chaignon <paul.chaignon@xxxxxxxxx> wrote:
> > >
> > > On Mon, Oct 30, 2023 at 10:19:01PM -0700, Andrii Nakryiko wrote:
> > > > On Mon, Oct 30, 2023 at 10:55 AM Alexei Starovoitov
> > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Oct 27, 2023 at 11:13:23AM -0700, Andrii Nakryiko wrote:
> > > > > >
> > > > > > Note, this is not unique to <range> vs <range> logic. Just recently ([0])
> > > > > > a related issue was reported for existing verifier logic. This patch set does
> > > > > > fix that issues as well, as pointed out on the mailing list.
> > > > > >
> > > > > >   [0] https://lore.kernel.org/bpf/CAEf4Bzbgf-WQSCz8D4Omh3zFdS4oWS6XELnE7VeoUWgKf3cpig@xxxxxxxxxxxxxx/
> > > > >
> > > > > Quick comment regarding shift out of bound issue.
> > > > > I think this patch set makes Hao Sun's repro not working, but I don't think
> > > > > the range vs range improvement fixes the underlying issue.
> > > >
> > > > Correct, yes, I think adjust_reg_min_max_vals() might still need some fixing.
> > > >
> > > > > Currently we do:
> > > > > if (umax_val >= insn_bitness)
> > > > >   mark_reg_unknown
> > > > > else
> > > > >   here were use src_reg->u32_max_value or src_reg->umax_value
> > > > > I suspect the insn_bitness check is buggy and it's still possible to hit UBSAN splat with
> > > > > out of bounds shift. Just need to try harder.
> > > > > if w8 < 0xffffffff goto +2;
> > > > > if r8 != r6 goto +1;
> > > > > w0 >>= w8;
> > > > > won't be enough anymore.
> > > >
> > > > Agreed, but I felt that fixing adjust_reg_min_max_vals() is out of
> > > > scope for this already large patch set. If someone can take a deeper
> > > > look into reg bounds for arithmetic operations, it would be great.
> > > >
> > > > On the other hand, one of those academic papers claimed to verify
> > > > soundness of verifier's reg bounds, so I wonder why they missed this?
> > >
> > > AFAICS, it should have been able to detect this bug. Equation (3) from
> > > [1, page 10] encodes the soundness condition for conditional jumps and
> > > the implementation definitely covers BPF_JEQ/JNE and the logic in
> > > check_cond_jmp_op. So either there's a bug in the implementation or I'm
> > > missing something about how it works. Let me cc two of the paper's
> > > authors :)
> > >
> > > Hari, Srinivas: Hao Sun recently discovered a bug in the range analysis
> > > logic of the verifier, when comparing two unknown scalars with
> > > non-overlapping ranges. See [2] for Eduard Zingerman's explanation. It
> > > seems to have existed for a while. Any idea why Agni didn't uncover it?
> > >
> > > 1 - https://harishankarv.github.io/assets/files/agni-cav23.pdf
> > > 2 - https://lore.kernel.org/bpf/8731196c9a847ff35073a2034662d3306cea805f.camel@xxxxxxxxx/
> > >
> > > > cc Paul, maybe he can clarify (and also, Paul, please try to run all
> > > > that formal verification machinery against this patch set, thanks!)
> > >
> 
> Thanks Paul for bringing this to our notice, and for the valuable clarifications
> you provided. The bug discovered by Hao Sun occurs only during verificaiton,
> when the verifier follows what is essentially dead code. An execution of the
> example eBPF program cannot manifest a mismatch between the verifier's beliefs
> about the values in registers and the actual values during execution. As such,
> the example eBPF program cannot be used to achieve an actual verifier bypass.

There's one caveat here I wanted to double check: speculative execution.
I discussed this a bit with Daniel and it seems likely that the
false/"impossible" path could happen at runtime under speculative
execution. Daniel provided [1, slide 69] as an example of a similar
case. In the verifier, such cases are covered by
sanitize_speculative_path [2].

So this speculative execution case would be a good motivation to cover
both paths in Agni. At the same time, it may not be enough to claim
that Agni also verifies all the verifier's protections against
speculative execution. And I'm also a bit worried about the runtime
cost of weakening Agni's verification condition to detect such bugs.

1 - https://popl22.sigplan.org/details/prisc-2022-papers/11/BPF-and-Spectre-Mitigating-transient-execution-attacks
2 - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9183671af6db

> 
> As pointed out by Eduard Zingerman in the mailing list thread, the issue arises
> when the verifier follows the false (similarly true) branch of a
> jump-if-not-equal (similarly jump-if-equal) instruction, when it is never
> possible that the jump condition is false (similarly true). While it is okay for
> the verifier to follow dead code generally, it so happens that the logic it uses
> to update the registers ranges does not work in this specific case, and ends up
> violating one of the invariants it is supposed to maintain (a <= b for a range
> [a, b]).
> 
> Agni's verification condition [1] is stricter. It follows the false (similarly
> true) branch of a jump-if-not-equal (similarly jump-if-equal) instruction *only*
> when it is possible that the registers are equal (similarly not equal). In
> essence, Agni discards the reported verifier bug as a false positive.
> 
> We can easily weaken Agni's verification condition to detect such bugs. We
> modified Agni's verification condition [2] to follow both the branches of a
> jump-if-not-equal instruction, regardless of whether it is possible that the
> registers can be equal. Indeed, the modified verification condition produced the
> umin > umax verifier bug from Hao's example. The example produced by Agni, and
> an extended discussion can be found at Agni's issue tracker [3].

Nice! Thanks for trying this out so quickly!
Did you notice any difference on the time it took to verify the
conditional jumps?

> 
> [1] https://user-images.githubusercontent.com/8588645/280917882-dc97090d-040a-43b0-9bf8-806081992716.png
> [2] https://user-images.githubusercontent.com/8588645/280925756-19336087-836f-45e5-87fb-c2453558df06.png
> [3] https://github.com/bpfverif/ebpf-range-analysis-verification-cav23/issues/15#issuecomment-1797858245
> 
> > > I tried it yesterday but am running into what looks like a bug in the
> > > LLVM IR to SMT conversion. Probably not something I can fix myself
> > > quickly so I'll need help from Hari & co.
> > >
> > > That said, even without your patchset, I'm running into another issue
> > > where the formal verification takes several times longer (up to weeks
> > > /o\) since v6.4.
> > >
> 
> I'm looking into this next, thanks for the heads up!
> 
> > That's unfortunate. If you figure this out, I'd still be interested in
> > doing an extra check. Meanwhile I'm working on doing more sanity
> > checks in the kernel (and inevitably having to debug and fix issues,
> > still working on this).