Re: Rethink how to deal with division/modulo-on-zero (was Re: FW: ebpf-docs: draft of ISA doc updates in progress)

Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> · Thu, 22 Sep 2022 13:00:14 +0800

On Wed, Sep 21, 2022 at 06:50:54AM -0700, Alexei Starovoitov wrote:
> On Wed, Sep 21, 2022 at 1:34 AM Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> wrote:
> >
> > Subject changed to reflect this reply is out of scope of the original topic
> > (ISA doc).
> >
> > On Tue, Sep 20, 2022 at 04:39:52PM -0700, Alexei Starovoitov wrote:
> > > On Tue, Sep 20, 2022 at 12:51 PM Dave Thaler <dthaler@xxxxxxxxxxxxx> wrote:
> > > > > -----Original Message-----
> > > > > From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> > > > > Sent: Monday, September 19, 2022 10:04 AM
> > > > > To: Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx>
> > > > > Cc: Dave Thaler <dthaler@xxxxxxxxxxxxx>; bpf <bpf@xxxxxxxxxxxxxxx>
> > > > > Subject: Re: FW: ebpf-docs: draft of ISA doc updates in progress
> > > > >
> > > > > On Wed, Sep 14, 2022 at 02:22:51PM +0800, Shung-Hsi Yu wrote:
> > > > > > As discussed in yesterday's session, there's no graceful abortion on
> > > > > > division by zero, instead, the BPF verifier in Linux prevents division
> > > > > > by zero from happening. Here a few additional notes:
> > > > >
> > > > > Hmm, I thought Alexei pointed out a while ago that divide by zero is now
> > > > > defined to return 0 following.  Ok, reading further along I think that is what
> > > > > you describe with the pseudo-code below.
> > > >
> > > > Based on the discussion at LPC, and the fact that older implementations,
> > > > as well as uBPF and rbpf still terminate the program, I've added this text
> > > > to permit both behaviors:
> > >
> > > That's not right. ubpf and rbpf are broken.
> > > We shouldn't be adding descriptions of broken implementations
> > > to the standard.
> > > There is no way to 'gracefully abort' in eBPF.
> > > There is a way to 'return 0' in cBPF, but that's different. See below.
> > >
> > > > > If eBPF program execution would result in division by zero,
> > > > > the destination register SHOULD instead be set to zero, but
> > > > > program execution MAY be gracefully aborted instead.
> > > > > Similarly, if execution would result in modulo by zero,
> > > > > the destination register SHOULD instead be set to the source value,
> > > > > but program execution MAY be gracefully aborted instead.
> >
> > While this makes implementation a lot easier, come to think of it though,
> > this behavior actually is more like a hack around having to deal with
> > division/modulo-on-zero at runtime. User doing statistic calculations with
> > BPF will get bit by this sooner or later, arriving at the wrong calculation
> > (fast-math comes to mind).
> 
> lol. If that was the case arm64 would be on fire long ago
> and users would complain in masses.
> Same with risc-v.

whoa, I had no idea.

And looking around I don't see complains. Taking what I said back and +1 for
using the current division/modulo-by-zero behavior as standard.

> > This seems to go against some general BPF principle[1] of preventing the
> > users from shooting themselves in the foot.
> >
> > Just like how BPF verifier prevents a _possible_ out-of-bound memory access,
> > e.g. arr[i] when `i` is not bound-checked. Ideally I'd expect a coherent
> > approach toward division/modulo-on-zero as well; the verifier should prevent
> > program that _might_ do division-on-zero from loading in the first place.
> > (Maybe possible to achieve if we introduce something like SCALAR_OR_NULL
> > composed type, but that's definitely not easy)
> >
> > Admittedly even if achievable, this is a radical approach that is not
> > backward compatible. If such check is implemented, programs that used to
> > load may now be rejected. (Usually new BPF verifier feature allows more BPF
> > program to pass the verifier, rather then the other way around)
> >
> > So, I don't have a good proposal at the moment. The purpose to this email is
> > to point what I see as an issue out and hope to start a discussion.
> >
> > 1: Okay, I'm making this up a bit, strictly speaking the BPF foundation is
> > safe program (one of Alexei's talk) rather than preventing users from
> > shooting themselves in the foot.
> 
> Safe != invalid.
> BPF doesn't prevent invalid programs.
> BPF ensures safety only, not crashing the kernel, not leaking data, etc.
> For example: under speculation arr[i] can go out of bounds
> and to prevent data leaks we insert masking operations.

Point taken, thanks for clarifying the difference between safe and invalid.

> Similar with div_by_0. If the verifier can detect that it will reject
> the prog, otherwise it will insert if(==0) xor, because not
> all architectures behave like arm64.

Speaking of which, should the "if(==0) xor" patching in do_misc_fixups() be
moved into JIT implementations and the interpretor? 

Given that the standard now mandates BPF_DIV with divisor of zero to return
zero, it would be rather confusing to see the output of `bpftool prog dump
xlated` contains the extra "if(==0) xor" instruction that's inserted by the
verifier.

Also, maybe there'll be performance benefit for arm64 and riscv where
"if(==0) xor" is not needed.