> On Thu, Feb 23, 2023 at 5:19 AM Jose E. Marchesi <jemarch@xxxxxxx> wrote: >> >> >> > On Wed, Feb 22, 2023 at 3:23 PM Jose E. Marchesi >> > <jose.marchesi@xxxxxxxxxx> wrote: >> >> >> >> >> >> > On Mon, Feb 20, 2023 at 2:37 PM Dave Thaler >> >> > <dthaler1968=40googlemail.com@xxxxxxxxxxxxxx> wrote: >> >> >> >> >> >> From: Dave Thaler <dthaler@xxxxxxxxxxxxx> >> >> >> >> >> >> Document the discussion from the email thread on the IETF bpf list, >> >> >> where it was explained that the raw format varies by endianness >> >> >> of the processor. >> >> >> >> >> >> Signed-off-by: Dave Thaler <dthaler@xxxxxxxxxxxxx> >> >> >> >> >> >> Acked-by: David Vernet <void@xxxxxxxxxxxxx> >> >> >> --- >> >> >> >> >> >> V1 -> V2: rebased on top of latest master >> >> >> --- >> >> >> Documentation/bpf/instruction-set.rst | 16 ++++++++++++++-- >> >> >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> >> >> >> >> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst >> >> >> index af515de5fc3..1d473f060fa 100644 >> >> >> --- a/Documentation/bpf/instruction-set.rst >> >> >> +++ b/Documentation/bpf/instruction-set.rst >> >> >> @@ -38,8 +38,9 @@ eBPF has two instruction encodings: >> >> >> * the wide instruction encoding, which appends a second 64-bit immediate (i.e., >> >> >> constant) value after the basic instruction for a total of 128 bits. >> >> >> >> >> >> -The basic instruction encoding is as follows, where MSB and LSB mean the most significant >> >> >> -bits and least significant bits, respectively: >> >> >> +The basic instruction encoding looks as follows for a little-endian processor, >> >> >> +where MSB and LSB mean the most significant bits and least significant bits, >> >> >> +respectively: >> >> >> >> >> >> ============= ======= ======= ======= ============ >> >> >> 32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) >> >> >> @@ -63,6 +64,17 @@ imm offset src_reg dst_reg opcode >> >> >> **opcode** >> >> >> operation to perform >> >> >> >> >> >> +and as follows for a big-endian processor: >> >> >> + >> >> >> +============= ======= ==================== =============== ============ >> >> >> +32 bits (MSB) 16 bits 4 bits 4 bits 8 bits (LSB) >> >> >> +============= ======= ==================== =============== ============ >> >> >> +immediate offset destination register source register opcode >> >> >> +============= ======= ==================== =============== ============ >> >> > >> >> > I've changed it to: >> >> > imm offset dst_reg src_reg opcode >> >> > >> >> > to match the little endian table, >> >> > but now one of the tables feels wrong. >> >> > The encoding is always done by applying C standard to the struct: >> >> > struct bpf_insn { >> >> > __u8 code; /* opcode */ >> >> > __u8 dst_reg:4; /* dest register */ >> >> > __u8 src_reg:4; /* source register */ >> >> > __s16 off; /* signed offset */ >> >> > __s32 imm; /* signed immediate constant */ >> >> > }; >> >> > I'm not sure how to express this clearly in the table. >> >> >> >> Perhaps it would be simpler to document how the instruction bytes are >> >> stored (be it in an ELF file or as bytes in a memory buffer to be loaded >> >> into the kernel or some other BPF consumer) as opposed to how the >> >> instructions look like once loaded (as a 64-bit word) by a little-endian >> >> or big-endian kernel? >> >> >> >> Stored little-endian BPF instructions: >> >> >> >> code src_reg dst_reg off imm >> >> >> >> foo-le.o: file format elf64-bpfle >> >> >> >> 0000000000000000 <.text>: >> >> 0: 07 01 00 00 ef be ad de r1 += 0xdeadbeef >> >> >> >> Stored big-endian BPF instructions: >> >> >> >> code dst_reg src_reg off imm >> >> >> >> foo-be.o: file format elf64-bpfbe >> >> >> >> 0000000000000000 <.text>: >> >> 0: 07 10 00 00 de ad be ef r1 += 0xdeadbeef >> >> >> >> i.e. in the stored bytes the code always comes first, then the >> >> registers, then the offset, then the immediate, regardless of >> >> endianness. >> >> >> >> This may be easier to understand by implementors looking to generate >> >> and/or consume bytes conforming BPF instructions. >> > >> > +1 >> > I like this format more as well. >> > Maybe we can drop the table and use a diagram of a kind ? >> > >> > opcode src dst offset imm assembly >> > 07 0 1 00 00 ef be ad de r1 += 0xdeadbeef // little >> > 07 1 0 00 00 de ad be ef r1 += 0xdeadbeef // big >> >> Good idea. What about something like this: >> >> opcode offset imm assembly >> src dst >> 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little >> dst src >> 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big >> >> I changed the immediate because 0xdeadbeef is negative and it may be >> confusing in the assembly part: strictly it would be r1 += -559038737. > > Looks great to me. Do you want to send your first kernel patch? :) Sure, will prepare it :)