RE: [Bpf] [PATCH] bpf, docs: Document BPF insn encoding in term of stored bytes

Dave Thaler <dthaler@xxxxxxxxxxxxx> · Fri, 24 Feb 2023 20:44:09 +0000

> -----Original Message-----
> From: Bpf <bpf-bounces@xxxxxxxx> On Behalf Of Jose E. Marchesi
> Sent: Friday, February 24, 2023 12:04 PM
> To: bpf <bpf@xxxxxxxxxxxxxxx>
> Cc: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx>; bpf@xxxxxxxx
> Subject: [Bpf] [PATCH] bpf, docs: Document BPF insn encoding in term of
> stored bytes
> 
> 
> This patch modifies instruction-set.rst so it documents the encoding of BPF
> instructions in terms of how the bytes are stored (be it in an ELF file or as
> bytes in a memory buffer to be loaded into the kernel or some other BPF
> consumer) as opposed to how the instruction looks like once loaded.
> 
> This is hopefully easier to understand by implementors looking to generate
> and/or consume bytes conforming BPF instructions.
> 
> The patch also clarifies that the unused bytes in a pseudo-instruction shall be
> cleared with zeros.
> 
> Signed-off-by: Jose E. Marchesi <jose.marchesi@xxxxxxxxxx>
> ---
>  Documentation/bpf/instruction-set.rst | 43 +++++++++++++--------------
>  1 file changed, 21 insertions(+), 22 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst
> b/Documentation/bpf/instruction-set.rst
> index 01802ed9b29b..9b28c0e15bb6 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -38,15 +38,13 @@ eBPF has two instruction encodings:
>  * the wide instruction encoding, which appends a second 64-bit immediate
> (i.e.,
>    constant) value after the basic instruction for a total of 128 bits.
> 
> -The basic instruction encoding looks as follows for a little-endian processor,
> -where MSB and LSB mean the most significant bits and least significant bits,
> -respectively:
> +The fields conforming an encoded basic instruction are stored in the
> +following order:
> 
> -=============  =======  =======  =======  ============
> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> -=============  =======  =======  =======  ============
> -imm            offset   src_reg  dst_reg  opcode
> -=============  =======  =======  =======  ============
> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.

Personally I find this notation harder to understand in general.
For example, it encodes (without explanation) the C language
assumption that "//" is a comment, ":" indicates a bit width,
and the fields are in order from most significate byte to least
significant byte.  The text before this change has no such
unexplained assumptions. 

[...]
> -Multi-byte fields ('imm' and 'offset') are similarly stored in -the byte order of
> the processor.
> +  opcode         offset imm          assembly
> +         src dst
> +  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
> +         dst src
> +  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big

Similar assumption without explanation of "//" meaning comment, and
some implied tabular formatting without being an actual table?

[...]
> -=================  ==================
> -64 bits (MSB)      64 bits (LSB)
> -=================  ==================
> -basic instruction  pseudo instruction
> -=================  ==================
> +This is depicted in the following figure:
> +
> +  basic_instruction                 pseudo_instruction
> +  code:8 regs:16 offset:16 imm:32 | unused:32 imm:32

And here the use of "|" above I find confusing.

What do others think?

Dave