Re: [PATCH v5 bpf-next 00/23] BPF register bounds logic and testing improvements

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Wed, 1 Nov 2023 10:13:26 -0700

On Wed, Nov 1, 2023 at 5:37 AM Paul Chaignon <paul.chaignon@xxxxxxxxx> wrote:
>
> On Mon, Oct 30, 2023 at 10:19:01PM -0700, Andrii Nakryiko wrote:
> > On Mon, Oct 30, 2023 at 10:55 AM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Fri, Oct 27, 2023 at 11:13:23AM -0700, Andrii Nakryiko wrote:
> > > >
> > > > Note, this is not unique to <range> vs <range> logic. Just recently ([0])
> > > > a related issue was reported for existing verifier logic. This patch set does
> > > > fix that issues as well, as pointed out on the mailing list.
> > > >
> > > >   [0] https://lore.kernel.org/bpf/CAEf4Bzbgf-WQSCz8D4Omh3zFdS4oWS6XELnE7VeoUWgKf3cpig@xxxxxxxxxxxxxx/
> > >
> > > Quick comment regarding shift out of bound issue.
> > > I think this patch set makes Hao Sun's repro not working, but I don't think
> > > the range vs range improvement fixes the underlying issue.
> >
> > Correct, yes, I think adjust_reg_min_max_vals() might still need some fixing.
> >
> > > Currently we do:
> > > if (umax_val >= insn_bitness)
> > >   mark_reg_unknown
> > > else
> > >   here were use src_reg->u32_max_value or src_reg->umax_value
> > > I suspect the insn_bitness check is buggy and it's still possible to hit UBSAN splat with
> > > out of bounds shift. Just need to try harder.
> > > if w8 < 0xffffffff goto +2;
> > > if r8 != r6 goto +1;
> > > w0 >>= w8;
> > > won't be enough anymore.
> >
> > Agreed, but I felt that fixing adjust_reg_min_max_vals() is out of
> > scope for this already large patch set. If someone can take a deeper
> > look into reg bounds for arithmetic operations, it would be great.
> >
> > On the other hand, one of those academic papers claimed to verify
> > soundness of verifier's reg bounds, so I wonder why they missed this?
>
> AFAICS, it should have been able to detect this bug. Equation (3) from
> [1, page 10] encodes the soundness condition for conditional jumps and
> the implementation definitely covers BPF_JEQ/JNE and the logic in
> check_cond_jmp_op. So either there's a bug in the implementation or I'm
> missing something about how it works. Let me cc two of the paper's
> authors :)
>
> Hari, Srinivas: Hao Sun recently discovered a bug in the range analysis
> logic of the verifier, when comparing two unknown scalars with
> non-overlapping ranges. See [2] for Eduard Zingerman's explanation. It
> seems to have existed for a while. Any idea why Agni didn't uncover it?
>
> 1 - https://harishankarv.github.io/assets/files/agni-cav23.pdf
> 2 - https://lore.kernel.org/bpf/8731196c9a847ff35073a2034662d3306cea805f.camel@xxxxxxxxx/
>
> > cc Paul, maybe he can clarify (and also, Paul, please try to run all
> > that formal verification machinery against this patch set, thanks!)
>
> I tried it yesterday but am running into what looks like a bug in the
> LLVM IR to SMT conversion. Probably not something I can fix myself
> quickly so I'll need help from Hari & co.
>
> That said, even without your patchset, I'm running into another issue
> where the formal verification takes several times longer (up to weeks
> /o\) since v6.4.
>

That's unfortunate. If you figure this out, I'd still be interested in
doing an extra check. Meanwhile I'm working on doing more sanity
checks in the kernel (and inevitably having to debug and fix issues,
still working on this).