On Fri, Jul 28, 2023 at 5:19 PM Will Hawkins <hawkinsw@xxxxxx> wrote: > > On Fri, Jul 28, 2023 at 8:05 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Fri, Jul 28, 2023 at 4:32 PM Will Hawkins <hawkinsw@xxxxxx> wrote: > > > > > > On Thu, Jul 27, 2023 at 9:05 PM Alexei Starovoitov > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > On Wed, Jul 26, 2023 at 12:16 PM Will Hawkins <hawkinsw@xxxxxx> wrote: > > > > > > > > > > On Tue, Jul 25, 2023 at 2:37 PM Watson Ladd <watsonbladd@xxxxxxxxx> wrote: > > > > > > > > > > > > On Tue, Jul 25, 2023 at 9:15 AM Alexei Starovoitov > > > > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > > > > > > > On Tue, Jul 25, 2023 at 7:03 AM Dave Thaler <dthaler@xxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > I am forwarding the email below (after converting HTML to plain text) > > > > > > > > to the mailto:bpf@xxxxxxxxxxxxxxx list so replies can go to both lists. > > > > > > > > > > > > > > > > Please use this one for any replies. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Dave > > > > > > > > > > > > > > > > > From: Bpf <bpf-bounces@xxxxxxxx> On Behalf Of Watson Ladd > > > > > > > > > Sent: Monday, July 24, 2023 10:05 PM > > > > > > > > > To: bpf@xxxxxxxx > > > > > > > > > Subject: [Bpf] Review of draft-thaler-bpf-isa-01 > > > > > > > > > > > > > > > > > > Dear BPF wg, > > > > > > > > > > > > > > > > > > I took a look at the draft and think it has some issues, unsurprisingly at this stage. One is > > > > > > > > > the specification seems to use an underspecified C pseudo code for operations vs > > > > > > > > > defining them mathematically. > > > > > > > > > > > > > > Hi Watson, > > > > > > > > > > > > > > This is not "underspecified C" pseudo code. > > > > > > > This is assembly syntax parsed and emitted by GCC, LLVM, gas, Linux Kernel, etc. > > > > > > > > > > > > I don't see a reference to any description of that in section 4.1. > > > > > > It's possible I've overlooked this, and if people think this style of > > > > > > definition is good enough that works for me. But I found table 4 > > > > > > pretty scanty on what exactly happens. > > > > > > > > > > Hello! Based on Watson's post, I have done some research and would > > > > > potentially like to offer a path forward. There are several different > > > > > ways that ISAs specify the semantics of their operations: > > > > > > > > > > 1. Intel has a section in their manual that describes the pseudocode > > > > > they use to specify their ISA: Section 3.1.1.9 of The Intel® 64 and > > > > > IA-32 Architectures Software Developer’s Manual at > > > > > https://cdrdv2.intel.com/v1/dl/getContent/671199 > > > > > 2. ARM has an equivalent for their variety of pseudocode: Chapter J1 > > > > > of Arm Architecture Reference Manual for A-profile architecture at > > > > > https://developer.arm.com/documentation/ddi0487/latest/ > > > > > 3. Sail "is a language for describing the instruction-set architecture > > > > > (ISA) semantics of processors." > > > > > (https://www.cl.cam.ac.uk/~pes20/sail/) > > > > > > > > > > Given the commercial nature of (1) and (2), perhaps Sail is a way to > > > > > proceed. If people are interested, I would be happy to lead an effort > > > > > to encode the eBPF ISA semantics in Sail (or find someone who already > > > > > has) and incorporate them in the draft. > > > > > > > > imo Sail is too researchy to have practical use. > > > > Looking at arm64 or x86 Sail description I really don't see how > > > > it would map to an IETF standard. > > > > It's done in a "sail" language that people need to learn first to be > > > > able to read it. > > > > Say we had bpf.sail somewhere on github. What value does it bring to > > > > BPF ISA standard? I don't see an immediate benefit to standardization. > > > > There could be other use cases, no doubt, but standardization is our goal. > > > > > > > > As far as 1 and 2. Intel and Arm use their own pseudocode, so they had > > > > to add a paragraph to describe it. We are using C to describe BPF ISA > > > > > > > > > I cannot find a reference in the current version that specifies what > > > we are using to describe the operations. I'd like to add that, but > > > want to make sure that I clarify two statements that seem to be at > > > odds. > > > > > > Immediately above you say that we are using "C to describe the BPF > > > ISA" and further above you say "This is assembly syntax parsed and > > > emitted by GCC, LLVM, gas, Linux Kernel, etc." > > > > > > My own reading is that it is the former, and not the latter. But, I > > > want to double check before adding the appropriate statements to the > > > Convention section. > > > > It's both. I'm not sure where you see a contradiction. > > It's a normal C syntax and it's emitted by the kernel verifier, > > parsed by clang/gcc assemblers and emitted by compilers. > > > Okay. I apologize. I am sincerely confused. For instance, > > if (u32)dst >= (u32)src goto +offset > > Looks like nothing that I have ever seen in "normal C syntax". I thought we're talking about table 4 and ALU ops. Above is not a pure C, but it's obvious enough without explanation, no? Also I don't see above anywhere in the doc. We describe conditionals like: BPF_JGE 0x3 any PC += offset if dst >= src > There also appear to be a few other places where things might be a bit wonky: > > 1. Address arithmetic in the description of the load/store > instructions will depend on the type of the target: E.g., > > *(u64 *)(dst + offset) = imm > > The address to which the store is done will be offset*sizeof(X) bytes > from dst where X is the type of the target of dst. If we are assuming > that dst (or its equivalent in similar instructions) is being treated > simply as an unsigned integer, I believe that we will have to say that > explicitly, especially given that we describe offset as "signed > integer offset used with pointer arithmetic" in the Instruction > encoding section. It's not: *((u64 *)(dst) + offset) = imm The doc doesn't say that 'dst' is a pointer 'u64 *dst' type. Instead it says: -- The 'code' field encodes the operation as below, where 'src' and 'dst' refer to the values of the source and destination registers, respectively. -- so dst + offset is a plain addition of two values and then type cast. > > 2. hto[bl]eN functions are not specified by standard C and, while > "obvious" what they do, are not defined in the document anywhere. yeah. we can add a short sentence about htoln.