On Fri, Feb 24, 2023 at 12:59 PM Jose E. Marchesi <jose.marchesi@xxxxxxxxxx> wrote: > > > >> -----Original Message----- > >> From: Bpf <bpf-bounces@xxxxxxxx> On Behalf Of Jose E. Marchesi > >> Sent: Friday, February 24, 2023 12:04 PM > >> To: bpf <bpf@xxxxxxxxxxxxxxx> > >> Cc: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx>; bpf@xxxxxxxx > >> Subject: [Bpf] [PATCH] bpf, docs: Document BPF insn encoding in term of > >> stored bytes > >> > >> > >> This patch modifies instruction-set.rst so it documents the encoding of BPF > >> instructions in terms of how the bytes are stored (be it in an ELF file or as > >> bytes in a memory buffer to be loaded into the kernel or some other BPF > >> consumer) as opposed to how the instruction looks like once loaded. > >> > >> This is hopefully easier to understand by implementors looking to generate > >> and/or consume bytes conforming BPF instructions. > >> > >> The patch also clarifies that the unused bytes in a pseudo-instruction shall be > >> cleared with zeros. > >> > >> Signed-off-by: Jose E. Marchesi <jose.marchesi@xxxxxxxxxx> > >> --- > >> Documentation/bpf/instruction-set.rst | 43 +++++++++++++-------------- > >> 1 file changed, 21 insertions(+), 22 deletions(-) > >> > >> diff --git a/Documentation/bpf/instruction-set.rst > >> b/Documentation/bpf/instruction-set.rst > >> index 01802ed9b29b..9b28c0e15bb6 100644 > >> --- a/Documentation/bpf/instruction-set.rst > >> +++ b/Documentation/bpf/instruction-set.rst > >> @@ -38,15 +38,13 @@ eBPF has two instruction encodings: > >> * the wide instruction encoding, which appends a second 64-bit immediate > >> (i.e., > >> constant) value after the basic instruction for a total of 128 bits. > >> > >> -The basic instruction encoding looks as follows for a little-endian processor, > >> -where MSB and LSB mean the most significant bits and least significant bits, > >> -respectively: > >> +The fields conforming an encoded basic instruction are stored in the > >> +following order: > >> > >> -============= ======= ======= ======= ============ > >> -32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) > >> -============= ======= ======= ======= ============ > >> -imm offset src_reg dst_reg opcode > >> -============= ======= ======= ======= ============ > >> + opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF. > >> + opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF. > > > > Personally I find this notation harder to understand in general. > > For example, it encodes (without explanation) the C language > > assumption that "//" is a comment, ":" indicates a bit width, > > and the fields are in order from most significate byte to least > > significant byte. The text before this change has no such > > unexplained assumptions. > > The fields are not ordered from "most significative byte" to "least > significative byte". The fields are ordered as they are stored. Thats > the whole point of the patch. > > As for //, :N and | below, I think these signs are obvious enough to not > require further explanation, but I wouldn't mind to use some other > better notation, if you can suggest one. I am not a very graphical > person myself. We're using plenty of C lingvo in this doc including: dst = dst ^ imm32 and dst = (u32) ((u32) dst + (u32) src) So I think '//' is fine to have without extra verbosity. imo the patch is a nice improvement in readability. The ldimm64 part is especially nice.