> On Wed, Feb 22, 2023 at 3:23 PM Jose E. Marchesi > <jose.marchesi@xxxxxxxxxx> wrote: >> >> >> > On Mon, Feb 20, 2023 at 2:37 PM Dave Thaler >> > <dthaler1968=40googlemail.com@xxxxxxxxxxxxxx> wrote: >> >> >> >> From: Dave Thaler <dthaler@xxxxxxxxxxxxx> >> >> >> >> Document the discussion from the email thread on the IETF bpf list, >> >> where it was explained that the raw format varies by endianness >> >> of the processor. >> >> >> >> Signed-off-by: Dave Thaler <dthaler@xxxxxxxxxxxxx> >> >> >> >> Acked-by: David Vernet <void@xxxxxxxxxxxxx> >> >> --- >> >> >> >> V1 -> V2: rebased on top of latest master >> >> --- >> >> Documentation/bpf/instruction-set.rst | 16 ++++++++++++++-- >> >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> >> >> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst >> >> index af515de5fc3..1d473f060fa 100644 >> >> --- a/Documentation/bpf/instruction-set.rst >> >> +++ b/Documentation/bpf/instruction-set.rst >> >> @@ -38,8 +38,9 @@ eBPF has two instruction encodings: >> >> * the wide instruction encoding, which appends a second 64-bit immediate (i.e., >> >> constant) value after the basic instruction for a total of 128 bits. >> >> >> >> -The basic instruction encoding is as follows, where MSB and LSB mean the most significant >> >> -bits and least significant bits, respectively: >> >> +The basic instruction encoding looks as follows for a little-endian processor, >> >> +where MSB and LSB mean the most significant bits and least significant bits, >> >> +respectively: >> >> >> >> ============= ======= ======= ======= ============ >> >> 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) >> >> @@ -63,6 +64,17 @@ imm offset src_reg dst_reg opcode >> >> **opcode** >> >> operation to perform >> >> >> >> +and as follows for a big-endian processor: >> >> + >> >> +============= ======= ==================== =============== ============ >> >> +32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) >> >> +============= ======= ==================== =============== ============ >> >> +immediate offset destination register source register opcode >> >> +============= ======= ==================== =============== ============ >> > >> > I've changed it to: >> > imm offset dst_reg src_reg opcode >> > >> > to match the little endian table, >> > but now one of the tables feels wrong. >> > The encoding is always done by applying C standard to the struct: >> > struct bpf_insn { >> > __u8 code; /* opcode */ >> > __u8 dst_reg:4; /* dest register */ >> > __u8 src_reg:4; /* source register */ >> > __s16 off; /* signed offset */ >> > __s32 imm; /* signed immediate constant */ >> > }; >> > I'm not sure how to express this clearly in the table. >> >> Perhaps it would be simpler to document how the instruction bytes are >> stored (be it in an ELF file or as bytes in a memory buffer to be loaded >> into the kernel or some other BPF consumer) as opposed to how the >> instructions look like once loaded (as a 64-bit word) by a little-endian >> or big-endian kernel? >> >> Stored little-endian BPF instructions: >> >> code src_reg dst_reg off imm >> >> foo-le.o: file format elf64-bpfle >> >> 0000000000000000 <.text>: >> 0: 07 01 00 00 ef be ad de r1 += 0xdeadbeef >> >> Stored big-endian BPF instructions: >> >> code dst_reg src_reg off imm >> >> foo-be.o: file format elf64-bpfbe >> >> 0000000000000000 <.text>: >> 0: 07 10 00 00 de ad be ef r1 += 0xdeadbeef >> >> i.e. in the stored bytes the code always comes first, then the >> registers, then the offset, then the immediate, regardless of >> endianness. >> >> This may be easier to understand by implementors looking to generate >> and/or consume bytes conforming BPF instructions. > > +1 > I like this format more as well. > Maybe we can drop the table and use a diagram of a kind ? > > opcode src dst offset imm assembly > 07 0 1 00 00 ef be ad de r1 += 0xdeadbeef // little > 07 1 0 00 00 de ad be ef r1 += 0xdeadbeef // big Good idea. What about something like this: opcode offset imm assembly src dst 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little dst src 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big I changed the immediate because 0xdeadbeef is negative and it may be confusing in the assembly part: strictly it would be r1 += -559038737.