Rethink how to deal with division/modulo-on-zero (was Re: FW: ebpf-docs: draft of ISA doc updates in progress)

Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx> · Wed, 21 Sep 2022 16:34:44 +0800

Subject changed to reflect this reply is out of scope of the original topic
(ISA doc).

On Tue, Sep 20, 2022 at 04:39:52PM -0700, Alexei Starovoitov wrote:
> On Tue, Sep 20, 2022 at 12:51 PM Dave Thaler <dthaler@xxxxxxxxxxxxx> wrote:
> > > -----Original Message-----
> > > From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> > > Sent: Monday, September 19, 2022 10:04 AM
> > > To: Shung-Hsi Yu <shung-hsi.yu@xxxxxxxx>
> > > Cc: Dave Thaler <dthaler@xxxxxxxxxxxxx>; bpf <bpf@xxxxxxxxxxxxxxx>
> > > Subject: Re: FW: ebpf-docs: draft of ISA doc updates in progress
> > >
> > > On Wed, Sep 14, 2022 at 02:22:51PM +0800, Shung-Hsi Yu wrote:
> > > > As discussed in yesterday's session, there's no graceful abortion on
> > > > division by zero, instead, the BPF verifier in Linux prevents division
> > > > by zero from happening. Here a few additional notes:
> > >
> > > Hmm, I thought Alexei pointed out a while ago that divide by zero is now
> > > defined to return 0 following.  Ok, reading further along I think that is what
> > > you describe with the pseudo-code below.
> >
> > Based on the discussion at LPC, and the fact that older implementations,
> > as well as uBPF and rbpf still terminate the program, I've added this text
> > to permit both behaviors:
> 
> That's not right. ubpf and rbpf are broken.
> We shouldn't be adding descriptions of broken implementations
> to the standard.
> There is no way to 'gracefully abort' in eBPF.
> There is a way to 'return 0' in cBPF, but that's different. See below.
> 
> > > If eBPF program execution would result in division by zero,
> > > the destination register SHOULD instead be set to zero, but
> > > program execution MAY be gracefully aborted instead.
> > > Similarly, if execution would result in modulo by zero,
> > > the destination register SHOULD instead be set to the source value,
> > > but program execution MAY be gracefully aborted instead.

While this makes implementation a lot easier, come to think of it though,
this behavior actually is more like a hack around having to deal with
division/modulo-on-zero at runtime. User doing statistic calculations with
BPF will get bit by this sooner or later, arriving at the wrong calculation
(fast-math comes to mind).

This seems to go against some general BPF principle[1] of preventing the
users from shooting themselves in the foot.

Just like how BPF verifier prevents a _possible_ out-of-bound memory access,
e.g. arr[i] when `i` is not bound-checked. Ideally I'd expect a coherent
approach toward division/modulo-on-zero as well; the verifier should prevent
program that _might_ do division-on-zero from loading in the first place.
(Maybe possible to achieve if we introduce something like SCALAR_OR_NULL
composed type, but that's definitely not easy)

Admittedly even if achievable, this is a radical approach that is not
backward compatible. If such check is implemented, programs that used to
load may now be rejected. (Usually new BPF verifier feature allows more BPF
program to pass the verifier, rather then the other way around)

So, I don't have a good proposal at the moment. The purpose to this email is
to point what I see as an issue out and hope to start a discussion.

1: Okay, I'm making this up a bit, strictly speaking the BPF foundation is
safe program (one of Alexei's talk) rather than preventing users from
shooting themselves in the foot.

[...]