Re: Rethink how to deal with division/modulo-on-zero (was Re: FW: ebpf-docs: draft of ISA doc updates in progress)

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 21 Sep 2022 06:50:54 -0700

On Wed, Sep 21, 2022 at 1:34 AM Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> wrote:
>
> Subject changed to reflect this reply is out of scope of the original topic
> (ISA doc).
>
> On Tue, Sep 20, 2022 at 04:39:52PM -0700, Alexei Starovoitov wrote:
> > On Tue, Sep 20, 2022 at 12:51 PM Dave Thaler <dthaler@xxxxxxxxxxxxx> wrote:
> > > > -----Original Message-----
> > > > From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> > > > Sent: Monday, September 19, 2022 10:04 AM
> > > > To: Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx>
> > > > Cc: Dave Thaler <dthaler@xxxxxxxxxxxxx>; bpf <bpf@xxxxxxxxxxxxxxx>
> > > > Subject: Re: FW: ebpf-docs: draft of ISA doc updates in progress
> > > >
> > > > On Wed, Sep 14, 2022 at 02:22:51PM +0800, Shung-Hsi Yu wrote:
> > > > > As discussed in yesterday's session, there's no graceful abortion on
> > > > > division by zero, instead, the BPF verifier in Linux prevents division
> > > > > by zero from happening. Here a few additional notes:
> > > >
> > > > Hmm, I thought Alexei pointed out a while ago that divide by zero is now
> > > > defined to return 0 following.  Ok, reading further along I think that is what
> > > > you describe with the pseudo-code below.
> > >
> > > Based on the discussion at LPC, and the fact that older implementations,
> > > as well as uBPF and rbpf still terminate the program, I've added this text
> > > to permit both behaviors:
> >
> > That's not right. ubpf and rbpf are broken.
> > We shouldn't be adding descriptions of broken implementations
> > to the standard.
> > There is no way to 'gracefully abort' in eBPF.
> > There is a way to 'return 0' in cBPF, but that's different. See below.
> >
> > > > If eBPF program execution would result in division by zero,
> > > > the destination register SHOULD instead be set to zero, but
> > > > program execution MAY be gracefully aborted instead.
> > > > Similarly, if execution would result in modulo by zero,
> > > > the destination register SHOULD instead be set to the source value,
> > > > but program execution MAY be gracefully aborted instead.
>
> While this makes implementation a lot easier, come to think of it though,
> this behavior actually is more like a hack around having to deal with
> division/modulo-on-zero at runtime. User doing statistic calculations with
> BPF will get bit by this sooner or later, arriving at the wrong calculation
> (fast-math comes to mind).

lol. If that was the case arm64 would be on fire long ago
and users would complain in masses.
Same with risc-v.

> This seems to go against some general BPF principle[1] of preventing the
> users from shooting themselves in the foot.
>
> Just like how BPF verifier prevents a _possible_ out-of-bound memory access,
> e.g. arr[i] when `i` is not bound-checked. Ideally I'd expect a coherent
> approach toward division/modulo-on-zero as well; the verifier should prevent
> program that _might_ do division-on-zero from loading in the first place.
> (Maybe possible to achieve if we introduce something like SCALAR_OR_NULL
> composed type, but that's definitely not easy)
>
> Admittedly even if achievable, this is a radical approach that is not
> backward compatible. If such check is implemented, programs that used to
> load may now be rejected. (Usually new BPF verifier feature allows more BPF
> program to pass the verifier, rather then the other way around)
>
> So, I don't have a good proposal at the moment. The purpose to this email is
> to point what I see as an issue out and hope to start a discussion.
>
> 1: Okay, I'm making this up a bit, strictly speaking the BPF foundation is
> safe program (one of Alexei's talk) rather than preventing users from
> shooting themselves in the foot.

Safe != invalid.
BPF doesn't prevent invalid programs.
BPF ensures safety only, not crashing the kernel, not leaking data, etc.
For example: under speculation arr[i] can go out of bounds
and to prevent data leaks we insert masking operations.
Similar with div_by_0. If the verifier can detect that it will reject
the prog, otherwise it will insert if(==0) xor, because not
all architectures behave like arm64.