Re: Rethink how to deal with division/modulo-on-zero (was Re: FW: ebpf-docs: draft of ISA doc updates in progress)

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 23 Sep 2022 16:15:14 -0700

On Wed, Sep 21, 2022 at 10:00 PM Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> wrote:
>
> On Wed, Sep 21, 2022 at 06:50:54AM -0700, Alexei Starovoitov wrote:
> > On Wed, Sep 21, 2022 at 1:34 AM Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> wrote:
> > >
> > > Subject changed to reflect this reply is out of scope of the original topic
> > > (ISA doc).
> > >
> > > On Tue, Sep 20, 2022 at 04:39:52PM -0700, Alexei Starovoitov wrote:
> > > > On Tue, Sep 20, 2022 at 12:51 PM Dave Thaler <dthaler@xxxxxxxxxxxxx> wrote:
> > > > > > -----Original Message-----
> > > > > > From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> > > > > > Sent: Monday, September 19, 2022 10:04 AM
> > > > > > To: Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx>
> > > > > > Cc: Dave Thaler <dthaler@xxxxxxxxxxxxx>; bpf <bpf@xxxxxxxxxxxxxxx>
> > > > > > Subject: Re: FW: ebpf-docs: draft of ISA doc updates in progress
> > > > > >
> > > > > > On Wed, Sep 14, 2022 at 02:22:51PM +0800, Shung-Hsi Yu wrote:
> > > > > > > As discussed in yesterday's session, there's no graceful abortion on
> > > > > > > division by zero, instead, the BPF verifier in Linux prevents division
> > > > > > > by zero from happening. Here a few additional notes:
> > > > > >
> > > > > > Hmm, I thought Alexei pointed out a while ago that divide by zero is now
> > > > > > defined to return 0 following.  Ok, reading further along I think that is what
> > > > > > you describe with the pseudo-code below.
> > > > >
> > > > > Based on the discussion at LPC, and the fact that older implementations,
> > > > > as well as uBPF and rbpf still terminate the program, I've added this text
> > > > > to permit both behaviors:
> > > >
> > > > That's not right. ubpf and rbpf are broken.
> > > > We shouldn't be adding descriptions of broken implementations
> > > > to the standard.
> > > > There is no way to 'gracefully abort' in eBPF.
> > > > There is a way to 'return 0' in cBPF, but that's different. See below.
> > > >
> > > > > > If eBPF program execution would result in division by zero,
> > > > > > the destination register SHOULD instead be set to zero, but
> > > > > > program execution MAY be gracefully aborted instead.
> > > > > > Similarly, if execution would result in modulo by zero,
> > > > > > the destination register SHOULD instead be set to the source value,
> > > > > > but program execution MAY be gracefully aborted instead.
> > >
> > > While this makes implementation a lot easier, come to think of it though,
> > > this behavior actually is more like a hack around having to deal with
> > > division/modulo-on-zero at runtime. User doing statistic calculations with
> > > BPF will get bit by this sooner or later, arriving at the wrong calculation
> > > (fast-math comes to mind).
> >
> > lol. If that was the case arm64 would be on fire long ago
> > and users would complain in masses.
> > Same with risc-v.
>
> whoa, I had no idea.
>
> And looking around I don't see complains. Taking what I said back and +1 for
> using the current division/modulo-by-zero behavior as standard.
>
> > > This seems to go against some general BPF principle[1] of preventing the
> > > users from shooting themselves in the foot.
> > >
> > > Just like how BPF verifier prevents a _possible_ out-of-bound memory access,
> > > e.g. arr[i] when `i` is not bound-checked. Ideally I'd expect a coherent
> > > approach toward division/modulo-on-zero as well; the verifier should prevent
> > > program that _might_ do division-on-zero from loading in the first place.
> > > (Maybe possible to achieve if we introduce something like SCALAR_OR_NULL
> > > composed type, but that's definitely not easy)
> > >
> > > Admittedly even if achievable, this is a radical approach that is not
> > > backward compatible. If such check is implemented, programs that used to
> > > load may now be rejected. (Usually new BPF verifier feature allows more BPF
> > > program to pass the verifier, rather then the other way around)
> > >
> > > So, I don't have a good proposal at the moment. The purpose to this email is
> > > to point what I see as an issue out and hope to start a discussion.
> > >
> > > 1: Okay, I'm making this up a bit, strictly speaking the BPF foundation is
> > > safe program (one of Alexei's talk) rather than preventing users from
> > > shooting themselves in the foot.
> >
> > Safe != invalid.
> > BPF doesn't prevent invalid programs.
> > BPF ensures safety only, not crashing the kernel, not leaking data, etc.
> > For example: under speculation arr[i] can go out of bounds
> > and to prevent data leaks we insert masking operations.
>
> Point taken, thanks for clarifying the difference between safe and invalid.
>
> > Similar with div_by_0. If the verifier can detect that it will reject
> > the prog, otherwise it will insert if(==0) xor, because not
> > all architectures behave like arm64.
>
> Speaking of which, should the "if(==0) xor" patching in do_misc_fixups() be
> moved into JIT implementations and the interpretor?
>
> Given that the standard now mandates BPF_DIV with divisor of zero to return
> zero, it would be rather confusing to see the output of `bpftool prog dump
> xlated` contains the extra "if(==0) xor" instruction that's inserted by the
> verifier.
>
> Also, maybe there'll be performance benefit for arm64 and riscv where
> "if(==0) xor" is not needed.

Tiny perf gain is not worth extra complexity in JITs.
Integer divide is the slowest operation in a CPU.
We're trying to do everything in the verifier and as little
as possible in JITs.