> -----Original Message----- > From: Bpf <bpf-bounces@xxxxxxxx> On Behalf Of Jose E. Marchesi > Sent: Friday, February 24, 2023 12:04 PM > To: bpf <bpf@xxxxxxxxxxxxxxx> > Cc: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx>; bpf@xxxxxxxx > Subject: [Bpf] [PATCH] bpf, docs: Document BPF insn encoding in term of > stored bytes > > > This patch modifies instruction-set.rst so it documents the encoding of BPF > instructions in terms of how the bytes are stored (be it in an ELF file or as > bytes in a memory buffer to be loaded into the kernel or some other BPF > consumer) as opposed to how the instruction looks like once loaded. > > This is hopefully easier to understand by implementors looking to generate > and/or consume bytes conforming BPF instructions. > > The patch also clarifies that the unused bytes in a pseudo-instruction shall be > cleared with zeros. > > Signed-off-by: Jose E. Marchesi <jose.marchesi@xxxxxxxxxx> > --- > Documentation/bpf/instruction-set.rst | 43 +++++++++++++-------------- > 1 file changed, 21 insertions(+), 22 deletions(-) > > diff --git a/Documentation/bpf/instruction-set.rst > b/Documentation/bpf/instruction-set.rst > index 01802ed9b29b..9b28c0e15bb6 100644 > --- a/Documentation/bpf/instruction-set.rst > +++ b/Documentation/bpf/instruction-set.rst > @@ -38,15 +38,13 @@ eBPF has two instruction encodings: > * the wide instruction encoding, which appends a second 64-bit immediate > (i.e., > constant) value after the basic instruction for a total of 128 bits. > > -The basic instruction encoding looks as follows for a little-endian processor, > -where MSB and LSB mean the most significant bits and least significant bits, > -respectively: > +The fields conforming an encoded basic instruction are stored in the > +following order: > > -============= ======= ======= ======= ============ > -32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) > -============= ======= ======= ======= ============ > -imm offset src_reg dst_reg opcode > -============= ======= ======= ======= ============ > + opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF. > + opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF. Personally I find this notation harder to understand in general. For example, it encodes (without explanation) the C language assumption that "//" is a comment, ":" indicates a bit width, and the fields are in order from most significate byte to least significant byte. The text before this change has no such unexplained assumptions. [...] > -Multi-byte fields ('imm' and 'offset') are similarly stored in -the byte order of > the processor. > + opcode offset imm assembly > + src dst > + 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little > + dst src > + 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big Similar assumption without explanation of "//" meaning comment, and some implied tabular formatting without being an actual table? [...] > -================= ================== > -64 bits (MSB) 64 bits (LSB) > -================= ================== > -basic instruction pseudo instruction > -================= ================== > +This is depicted in the following figure: > + > + basic_instruction pseudo_instruction > + code:8 regs:16 offset:16 imm:32 | unused:32 imm:32 And here the use of "|" above I find confusing. What do others think? Dave