On Wed, Feb 22, 2023 at 3:23 PM Jose E. Marchesi <jose.marchesi@xxxxxxxxxx> wrote: > > > > On Mon, Feb 20, 2023 at 2:37 PM Dave Thaler > > <dthaler1968=40googlemail.com@xxxxxxxxxxxxxx> wrote: > >> > >> From: Dave Thaler <dthaler@xxxxxxxxxxxxx> > >> > >> Document the discussion from the email thread on the IETF bpf list, > >> where it was explained that the raw format varies by endianness > >> of the processor. > >> > >> Signed-off-by: Dave Thaler <dthaler@xxxxxxxxxxxxx> > >> > >> Acked-by: David Vernet <void@xxxxxxxxxxxxx> > >> --- > >> > >> V1 -> V2: rebased on top of latest master > >> --- > >> Documentation/bpf/instruction-set.rst | 16 ++++++++++++++-- > >> 1 file changed, 14 insertions(+), 2 deletions(-) > >> > >> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst > >> index af515de5fc3..1d473f060fa 100644 > >> --- a/Documentation/bpf/instruction-set.rst > >> +++ b/Documentation/bpf/instruction-set.rst > >> @@ -38,8 +38,9 @@ eBPF has two instruction encodings: > >> * the wide instruction encoding, which appends a second 64-bit immediate (i.e., > >> constant) value after the basic instruction for a total of 128 bits. > >> > >> -The basic instruction encoding is as follows, where MSB and LSB mean the most significant > >> -bits and least significant bits, respectively: > >> +The basic instruction encoding looks as follows for a little-endian processor, > >> +where MSB and LSB mean the most significant bits and least significant bits, > >> +respectively: > >> > >> ============= ======= ======= ======= ============ > >> 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) > >> @@ -63,6 +64,17 @@ imm offset src_reg dst_reg opcode > >> **opcode** > >> operation to perform > >> > >> +and as follows for a big-endian processor: > >> + > >> +============= ======= ==================== =============== ============ > >> +32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) > >> +============= ======= ==================== =============== ============ > >> +immediate offset destination register source register opcode > >> +============= ======= ==================== =============== ============ > > > > I've changed it to: > > imm offset dst_reg src_reg opcode > > > > to match the little endian table, > > but now one of the tables feels wrong. > > The encoding is always done by applying C standard to the struct: > > struct bpf_insn { > > __u8 code; /* opcode */ > > __u8 dst_reg:4; /* dest register */ > > __u8 src_reg:4; /* source register */ > > __s16 off; /* signed offset */ > > __s32 imm; /* signed immediate constant */ > > }; > > I'm not sure how to express this clearly in the table. > > Perhaps it would be simpler to document how the instruction bytes are > stored (be it in an ELF file or as bytes in a memory buffer to be loaded > into the kernel or some other BPF consumer) as opposed to how the > instructions look like once loaded (as a 64-bit word) by a little-endian > or big-endian kernel? > > Stored little-endian BPF instructions: > > code src_reg dst_reg off imm > > foo-le.o: file format elf64-bpfle > > 0000000000000000 <.text>: > 0: 07 01 00 00 ef be ad de r1 += 0xdeadbeef > > Stored big-endian BPF instructions: > > code dst_reg src_reg off imm > > foo-be.o: file format elf64-bpfbe > > 0000000000000000 <.text>: > 0: 07 10 00 00 de ad be ef r1 += 0xdeadbeef > > i.e. in the stored bytes the code always comes first, then the > registers, then the offset, then the immediate, regardless of > endianness. > > This may be easier to understand by implementors looking to generate > and/or consume bytes conforming BPF instructions. +1 I like this format more as well. Maybe we can drop the table and use a diagram of a kind ? opcode src dst offset imm assembly 07 0 1 00 00 ef be ad de r1 += 0xdeadbeef // little 07 1 0 00 00 de ad be ef r1 += 0xdeadbeef // big