Re: [Bpf] Review of draft-thaler-bpf-isa-01

Will Hawkins <hawkinsw@xxxxxx> · Fri, 28 Jul 2023 20:46:19 -0400

On Fri, Jul 28, 2023 at 8:35 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, Jul 28, 2023 at 5:19 PM Will Hawkins <hawkinsw@xxxxxx> wrote:
> >
> > On Fri, Jul 28, 2023 at 8:05 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Fri, Jul 28, 2023 at 4:32 PM Will Hawkins <hawkinsw@xxxxxx> wrote:
> > > >
> > > > On Thu, Jul 27, 2023 at 9:05 PM Alexei Starovoitov
> > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, Jul 26, 2023 at 12:16 PM Will Hawkins <hawkinsw@xxxxxx> wrote:
> > > > > >
> > > > > > On Tue, Jul 25, 2023 at 2:37 PM Watson Ladd <watsonbladd@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Tue, Jul 25, 2023 at 9:15 AM Alexei Starovoitov
> > > > > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jul 25, 2023 at 7:03 AM Dave Thaler <dthaler@xxxxxxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > I am forwarding the email below (after converting HTML to plain text)
> > > > > > > > > to the mailto:bpf@xxxxxxxxxxxxxxx list so replies can go to both lists.
> > > > > > > > >
> > > > > > > > > Please use this one for any replies.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Dave
> > > > > > > > >
> > > > > > > > > > From: Bpf <bpf-bounces@xxxxxxxx> On Behalf Of Watson Ladd
> > > > > > > > > > Sent: Monday, July 24, 2023 10:05 PM
> > > > > > > > > > To: bpf@xxxxxxxx
> > > > > > > > > > Subject: [Bpf] Review of draft-thaler-bpf-isa-01
> > > > > > > > > >
> > > > > > > > > > Dear BPF wg,
> > > > > > > > > >
> > > > > > > > > > I took a look at the draft and think it has some issues, unsurprisingly at this stage. One is
> > > > > > > > > > the specification seems to use an underspecified C pseudo code for operations vs
> > > > > > > > > > defining them mathematically.
> > > > > > > >
> > > > > > > > Hi Watson,
> > > > > > > >
> > > > > > > > This is not "underspecified C" pseudo code.
> > > > > > > > This is assembly syntax parsed and emitted by GCC, LLVM, gas, Linux Kernel, etc.
> > > > > > >
> > > > > > > I don't see a reference to any description of that in section 4.1.
> > > > > > > It's possible I've overlooked this, and if people think this style of
> > > > > > > definition is good enough that works for me. But I found table 4
> > > > > > > pretty scanty on what exactly happens.
> > > > > >
> > > > > > Hello! Based on Watson's post, I have done some research and would
> > > > > > potentially like to offer a path forward. There are several different
> > > > > > ways that ISAs specify the semantics of their operations:
> > > > > >
> > > > > > 1. Intel has a section in their manual that describes the pseudocode
> > > > > > they use to specify their ISA: Section 3.1.1.9 of The Intel® 64 and
> > > > > > IA-32 Architectures Software Developer’s Manual at
> > > > > > https://cdrdv2.intel.com/v1/dl/getContent/671199
> > > > > > 2. ARM has an equivalent for their variety of pseudocode: Chapter J1
> > > > > > of Arm Architecture Reference Manual for A-profile architecture at
> > > > > > https://developer.arm.com/documentation/ddi0487/latest/
> > > > > > 3. Sail "is a language for describing the instruction-set architecture
> > > > > > (ISA) semantics of processors."
> > > > > > (https://www.cl.cam.ac.uk/~pes20/sail/)
> > > > > >
> > > > > > Given the commercial nature of (1) and (2), perhaps Sail is a way to
> > > > > > proceed. If people are interested, I would be happy to lead an effort
> > > > > > to encode the eBPF ISA semantics in Sail (or find someone who already
> > > > > > has) and incorporate them in the draft.
> > > > >
> > > > > imo Sail is too researchy to have practical use.
> > > > > Looking at arm64 or x86 Sail description I really don't see how
> > > > > it would map to an IETF standard.
> > > > > It's done in a "sail" language that people need to learn first to be
> > > > > able to read it.
> > > > > Say we had bpf.sail somewhere on github. What value does it bring to
> > > > > BPF ISA standard? I don't see an immediate benefit to standardization.
> > > > > There could be other use cases, no doubt, but standardization is our goal.
> > > > >
> > > > > As far as 1 and 2. Intel and Arm use their own pseudocode, so they had
> > > > > to add a paragraph to describe it. We are using C to describe BPF ISA
> > > >
> > > >
> > > > I cannot find a reference in the current version that specifies what
> > > > we are using to describe the operations. I'd like to add that, but
> > > > want to make sure that I clarify two statements that seem to be at
> > > > odds.
> > > >
> > > > Immediately above you say that we are using "C to describe the BPF
> > > > ISA" and further above you say "This is assembly syntax parsed and
> > > > emitted by GCC, LLVM, gas, Linux Kernel, etc."
> > > >
> > > > My own reading is that it is the former, and not the latter. But, I
> > > > want to double check before adding the appropriate statements to the
> > > > Convention section.
> > >
> > > It's both. I'm not sure where you see a contradiction.
> > > It's a normal C syntax and it's emitted by the kernel verifier,
> > > parsed by clang/gcc assemblers and emitted by compilers.
> >
> >
> > Okay. I apologize. I am sincerely confused. For instance,
> >
> > if (u32)dst >= (u32)src goto +offset
> >
> > Looks like nothing that I have ever seen in "normal C syntax".
>
> I thought we're talking about table 4 and ALU ops.
> Above is not a pure C, but it's obvious enough without explanation, no?

To "us", yes. Although I am not an expert, it seems like being
explicit is important when it comes to writing a spec. I suppose we
should leave that to Dave and the chairs.

> Also I don't see above anywhere in the doc.

That is from the Appendix. It is currently in Dave's tree and gets
amalgamated with other files to build the final draft.

https://datatracker.ietf.org/doc/draft-thaler-bpf-isa/

> We describe conditionals like:
> BPF_JGE   0x3    any  PC += offset if dst >= src
>
> > There also appear to be a few other places where things might be a bit wonky:
> >
> > 1. Address arithmetic in the description of the load/store
> > instructions will depend on the type of the target: E.g.,
> >
> > *(u64 *)(dst + offset) = imm
> >
> > The address to which the store is done will be offset*sizeof(X) bytes
> > from dst where X is the type of the target of dst. If we are assuming
> > that dst (or its equivalent in similar instructions) is being treated
> > simply as an unsigned integer, I believe that we will have to say that
> > explicitly, especially given that we describe offset as "signed
> > integer offset used with pointer arithmetic" in the Instruction
> > encoding section.
>
> It's not:
> *((u64 *)(dst) + offset) = imm
>
> The doc doesn't say that 'dst' is a pointer 'u64 *dst' type.
> Instead it says:
> --
> The 'code' field encodes the operation as below, where 'src' and 'dst' refer
> to the values of the source and destination registers, respectively.
> --
>
> so dst + offset is a plain addition of two values and then type cast.

Again I of course understand and "we" know what that means. However,
it seems to me that an earlier description of offset as "signed
integer offset used with pointer arithmetic" might signal something
else to an unfamiliar reader.

Will

>
> >
> > 2. hto[bl]eN functions are not specified by standard C and, while
> > "obvious" what they do, are not defined in the document anywhere.
>
> yeah. we can add a short sentence about htoln.