Re: [Bpf] Review of draft-thaler-bpf-isa-01

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 28 Jul 2023 17:35:05 -0700

On Fri, Jul 28, 2023 at 5:19 PM Will Hawkins <hawkinsw@xxxxxx> wrote:
>
> On Fri, Jul 28, 2023 at 8:05 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Fri, Jul 28, 2023 at 4:32 PM Will Hawkins <hawkinsw@xxxxxx> wrote:
> > >
> > > On Thu, Jul 27, 2023 at 9:05 PM Alexei Starovoitov
> > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > >
> > > > On Wed, Jul 26, 2023 at 12:16 PM Will Hawkins <hawkinsw@xxxxxx> wrote:
> > > > >
> > > > > On Tue, Jul 25, 2023 at 2:37 PM Watson Ladd <watsonbladd@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Tue, Jul 25, 2023 at 9:15 AM Alexei Starovoitov
> > > > > > <alexei.starovoitov@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Tue, Jul 25, 2023 at 7:03 AM Dave Thaler <dthaler@xxxxxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > I am forwarding the email below (after converting HTML to plain text)
> > > > > > > > to the mailto:bpf@xxxxxxxxxxxxxxx list so replies can go to both lists.
> > > > > > > >
> > > > > > > > Please use this one for any replies.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Dave
> > > > > > > >
> > > > > > > > > From: Bpf <bpf-bounces@xxxxxxxx> On Behalf Of Watson Ladd
> > > > > > > > > Sent: Monday, July 24, 2023 10:05 PM
> > > > > > > > > To: bpf@xxxxxxxx
> > > > > > > > > Subject: [Bpf] Review of draft-thaler-bpf-isa-01
> > > > > > > > >
> > > > > > > > > Dear BPF wg,
> > > > > > > > >
> > > > > > > > > I took a look at the draft and think it has some issues, unsurprisingly at this stage. One is
> > > > > > > > > the specification seems to use an underspecified C pseudo code for operations vs
> > > > > > > > > defining them mathematically.
> > > > > > >
> > > > > > > Hi Watson,
> > > > > > >
> > > > > > > This is not "underspecified C" pseudo code.
> > > > > > > This is assembly syntax parsed and emitted by GCC, LLVM, gas, Linux Kernel, etc.
> > > > > >
> > > > > > I don't see a reference to any description of that in section 4.1.
> > > > > > It's possible I've overlooked this, and if people think this style of
> > > > > > definition is good enough that works for me. But I found table 4
> > > > > > pretty scanty on what exactly happens.
> > > > >
> > > > > Hello! Based on Watson's post, I have done some research and would
> > > > > potentially like to offer a path forward. There are several different
> > > > > ways that ISAs specify the semantics of their operations:
> > > > >
> > > > > 1. Intel has a section in their manual that describes the pseudocode
> > > > > they use to specify their ISA: Section 3.1.1.9 of The Intel® 64 and
> > > > > IA-32 Architectures Software Developer’s Manual at
> > > > > https://cdrdv2.intel.com/v1/dl/getContent/671199
> > > > > 2. ARM has an equivalent for their variety of pseudocode: Chapter J1
> > > > > of Arm Architecture Reference Manual for A-profile architecture at
> > > > > https://developer.arm.com/documentation/ddi0487/latest/
> > > > > 3. Sail "is a language for describing the instruction-set architecture
> > > > > (ISA) semantics of processors."
> > > > > (https://www.cl.cam.ac.uk/~pes20/sail/)
> > > > >
> > > > > Given the commercial nature of (1) and (2), perhaps Sail is a way to
> > > > > proceed. If people are interested, I would be happy to lead an effort
> > > > > to encode the eBPF ISA semantics in Sail (or find someone who already
> > > > > has) and incorporate them in the draft.
> > > >
> > > > imo Sail is too researchy to have practical use.
> > > > Looking at arm64 or x86 Sail description I really don't see how
> > > > it would map to an IETF standard.
> > > > It's done in a "sail" language that people need to learn first to be
> > > > able to read it.
> > > > Say we had bpf.sail somewhere on github. What value does it bring to
> > > > BPF ISA standard? I don't see an immediate benefit to standardization.
> > > > There could be other use cases, no doubt, but standardization is our goal.
> > > >
> > > > As far as 1 and 2. Intel and Arm use their own pseudocode, so they had
> > > > to add a paragraph to describe it. We are using C to describe BPF ISA
> > >
> > >
> > > I cannot find a reference in the current version that specifies what
> > > we are using to describe the operations. I'd like to add that, but
> > > want to make sure that I clarify two statements that seem to be at
> > > odds.
> > >
> > > Immediately above you say that we are using "C to describe the BPF
> > > ISA" and further above you say "This is assembly syntax parsed and
> > > emitted by GCC, LLVM, gas, Linux Kernel, etc."
> > >
> > > My own reading is that it is the former, and not the latter. But, I
> > > want to double check before adding the appropriate statements to the
> > > Convention section.
> >
> > It's both. I'm not sure where you see a contradiction.
> > It's a normal C syntax and it's emitted by the kernel verifier,
> > parsed by clang/gcc assemblers and emitted by compilers.
>
>
> Okay. I apologize. I am sincerely confused. For instance,
>
> if (u32)dst >= (u32)src goto +offset
>
> Looks like nothing that I have ever seen in "normal C syntax".

I thought we're talking about table 4 and ALU ops.
Above is not a pure C, but it's obvious enough without explanation, no?
Also I don't see above anywhere in the doc.
We describe conditionals like:
BPF_JGE   0x3    any  PC += offset if dst >= src

> There also appear to be a few other places where things might be a bit wonky:
>
> 1. Address arithmetic in the description of the load/store
> instructions will depend on the type of the target: E.g.,
>
> *(u64 *)(dst + offset) = imm
>
> The address to which the store is done will be offset*sizeof(X) bytes
> from dst where X is the type of the target of dst. If we are assuming
> that dst (or its equivalent in similar instructions) is being treated
> simply as an unsigned integer, I believe that we will have to say that
> explicitly, especially given that we describe offset as "signed
> integer offset used with pointer arithmetic" in the Instruction
> encoding section.

It's not:
*((u64 *)(dst) + offset) = imm

The doc doesn't say that 'dst' is a pointer 'u64 *dst' type.
Instead it says:
--
The 'code' field encodes the operation as below, where 'src' and 'dst' refer
to the values of the source and destination registers, respectively.
--

so dst + offset is a plain addition of two values and then type cast.

>
> 2. hto[bl]eN functions are not specified by standard C and, while
> "obvious" what they do, are not defined in the document anywhere.

yeah. we can add a short sentence about htoln.