RE: [Bpf] [PATCH V3] bpf, docs: Document BPF insn encoding in term of stored bytes

Dave Thaler <dthaler@xxxxxxxxxxxxx> · Mon, 27 Feb 2023 21:49:15 +0000

> -----Original Message-----
> From: Bpf <bpf-bounces@xxxxxxxx> On Behalf Of Jose E. Marchesi
> Sent: Monday, February 27, 2023 1:06 PM
> To: bpf <bpf@xxxxxxxxxxxxxxx>
> Cc: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx>; bpf@xxxxxxxx; David
> Vernet <void@xxxxxxxxxxxxx>
> Subject: [Bpf] [PATCH V3] bpf, docs: Document BPF insn encoding in term of
> stored bytes
> 
> 
> [Changes from V2:
> - Use src and dst consistently in the document.

Since my earlier patch, src and dst refer to the values whereas
src_reg and dst_reg refer to register numbers.

> - Use a more graphical depiction of the 128-bit instruction.
> - Remove `Where:' fragment.
> - Clarify that unused bits are reserved and shall be zeroed.]
> 
> This patch modifies instruction-set.rst so it documents the encoding of BPF
> instructions in terms of how the bytes are stored (be it in an ELF file or as
> bytes in a memory buffer to be loaded into the kernel or some other BPF
> consumer) as opposed to how the instruction looks like once loaded.
> 
> This is hopefully easier to understand by implementors looking to generate
> and/or consume bytes conforming BPF instructions.
> 
> The patch also clarifies that the unused bytes in a pseudo-instruction shall be
> cleared with zeros.
> 
> Signed-off-by: Jose E. Marchesi <jose.marchesi@xxxxxxxxxx>
> ---
>  Documentation/bpf/instruction-set.rst | 63 ++++++++++++++-------------
>  1 file changed, 33 insertions(+), 30 deletions(-)
> 
> diff --git a/Documentation/bpf/instruction-set.rst
> b/Documentation/bpf/instruction-set.rst
> index 01802ed9b29b..fae2e48d6a0b 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -38,15 +38,11 @@ eBPF has two instruction encodings:
>  * the wide instruction encoding, which appends a second 64-bit immediate
> (i.e.,
>    constant) value after the basic instruction for a total of 128 bits.
> 
> -The basic instruction encoding looks as follows for a little-endian processor,
> -where MSB and LSB mean the most significant bits and least significant bits,
> -respectively:
> +The fields conforming an encoded basic instruction are stored in the
> +following order::
> 
> -=============  =======  =======  =======  ============
> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> -=============  =======  =======  =======  ============
> -imm            offset   src_reg  dst_reg  opcode
> -=============  =======  =======  =======  ============
> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.

I think those should be src_reg and dst_reg (as the register numbers)
not src and dst (which are the values) or this will be a documentation
regression.

Right now I think this is a regression since if I understand right, with this
patch, "src" and "dst" now refer to both in different places which is
confusing.

Dave

>  **imm**
>    signed integer immediate value
> @@ -54,48 +50,55 @@ imm            offset   src_reg  dst_reg  opcode
>  **offset**
>    signed integer offset used with pointer arithmetic
> 
> -**src_reg**
> +**src**
>    the source register number (0-10), except where otherwise specified
>    (`64-bit immediate instructions`_ reuse this field for other purposes)
> 
> -**dst_reg**
> +**dst**
>    destination register number (0-10)
> 
>  **opcode**
>    operation to perform
> 
> -and as follows for a big-endian processor:
> +Note that the contents of multi-byte fields ('imm' and 'offset') are
> +stored using big-endian byte ordering in big-endian BPF and
> +little-endian byte ordering in little-endian BPF.
> 
> -=============  =======  =======  =======  ============
> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> -=============  =======  =======  =======  ============
> -imm            offset   dst_reg  src_reg  opcode
> -=============  =======  =======  =======  ============
> +For example::
> 
> -Multi-byte fields ('imm' and 'offset') are similarly stored in -the byte order of
> the processor.
> +  opcode         offset imm          assembly
> +         src dst
> +  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
> +         dst src
> +  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big
> 
>  Note that most instructions do not use all of the fields.
>  Unused fields shall be cleared to zero.
> 
> -As discussed below in `64-bit immediate instructions`_, a 64-bit immediate -
> instruction uses a 64-bit immediate value that is constructed as follows.
> -The 64 bits following the basic instruction contain a pseudo instruction -
> using the same format but with opcode, dst_reg, src_reg, and offset all set to
> zero, -and imm containing the high 32 bits of the immediate value.
> +As discussed below in `64-bit immediate instructions`_, a 64-bit
> +immediate instruction uses a 64-bit immediate value that is constructed
> +as follows.  The 64 bits following the basic instruction contain a
> +pseudo instruction using the same format but with opcode, dst, src, and
> +offset all set to zero, and imm containing the high 32 bits of the
> +immediate value.
> 
> -=================  ==================
> -64 bits (MSB)      64 bits (LSB)
> -=================  ==================
> -basic instruction  pseudo instruction
> -=================  ==================
> +This is depicted in the following figure::
> +
> +        basic_instruction
> +  .-----------------------------.
> +  |                             |
> +  code:8 regs:16 offset:16 imm:32 unused:32 imm:32
> +                                  |              |
> +                                  '--------------'
> +                                 pseudo instruction
> 
>  Thus the 64-bit immediate value is constructed as follows:
> 
>    imm64 = (next_imm << 32) | imm
> 
>  where 'next_imm' refers to the imm value of the pseudo instruction -
> following the basic instruction.
> +following the basic instruction.  The unused bytes in the pseudo
> +instruction are reserved and shall be cleared to zero.
> 
>  Instruction classes
>  -------------------
> @@ -137,7 +140,7 @@ code            source  instruction class
>    source  value  description
>    ======  =====  ==============================================
>    BPF_K   0x00   use 32-bit 'imm' value as source operand
> -  BPF_X   0x08   use 'src_reg' register value as source operand
> +  BPF_X   0x08   use 'src' register value as source operand
>    ======  =====  ==============================================
> 
>  **instruction class**
> --
> 2.30.2
> 
> --
> Bpf mailing list
> Bpf@xxxxxxxx
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> .ietf.org%2Fmailman%2Flistinfo%2Fbpf&data=05%7C01%7Cdthaler%40micro
> soft.com%7C65d83bf2fe834f73f84908db19067400%7C72f988bf86f141af91ab
> 2d7cd011db47%7C1%7C0%7C638131287757978381%7CUnknown%7CTWFpb
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
> Mn0%3D%7C3000%7C%7C%7C&sdata=8il1%2B8I1T8GBqn3U%2B7YJehIKjS6s
> gvxTRWS2CTpg%2FZY%3D&reserved=0