Re: [Bpf] [PATCH V2] bpf, docs: Document BPF insn encoding in term of stored bytes

"Jose E. Marchesi" <jose.marchesi@xxxxxxxxxx> · Mon, 27 Feb 2023 20:31:33 +0100

> On Mon, Feb 27, 2023 at 07:21:25PM +0100, Jose E. Marchesi wrote:
>> 
>> [Differences from V1:
>> - Use rst literal blocks for figures.
>> - Avoid using | in the basic instruction/pseudo instruction figure.
>> - Rebased to today's bpf-next master branch.]
>> 
>> This patch modifies instruction-set.rst so it documents the encoding
>> of BPF instructions in terms of how the bytes are stored (be it in an
>> ELF file or as bytes in a memory buffer to be loaded into the kernel
>> or some other BPF consumer) as opposed to how the instruction looks
>> like once loaded.
>> 
>> This is hopefully easier to understand by implementors looking to
>> generate and/or consume bytes conforming BPF instructions.
>> 
>> The patch also clarifies that the unused bytes in a pseudo-instruction
>> shall be cleared with zeros.
>> 
>> Signed-off-by: Jose E. Marchesi <jose.marchesi@xxxxxxxxxx>
>
> Thanks Jose, this looks a lot better. Just left a couple more small
> suggestions + questions below before stamping.
>
>> 
>> ---
>>  Documentation/bpf/instruction-set.rst | 44 +++++++++++++--------------
>>  1 file changed, 22 insertions(+), 22 deletions(-)
>> 
>> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
>> index 01802ed9b29b..3341bfe20e4d 100644
>> --- a/Documentation/bpf/instruction-set.rst
>> +++ b/Documentation/bpf/instruction-set.rst
>> @@ -38,15 +38,13 @@ eBPF has two instruction encodings:
>>  * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
>>    constant) value after the basic instruction for a total of 128 bits.
>>  
>> -The basic instruction encoding looks as follows for a little-endian processor,
>> -where MSB and LSB mean the most significant bits and least significant bits,
>> -respectively:
>> +The fields conforming an encoded basic instruction are stored in the
>> +following order::
>>  
>> -=============  =======  =======  =======  ============
>> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
>> -=============  =======  =======  =======  ============
>> -imm            offset   src_reg  dst_reg  opcode
>> -=============  =======  =======  =======  ============
>> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
>> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
>
> The terms below use src_reg and dst_reg. Can we either update these code
> blocks to match, or change the term definitions to "src" and "dst"? I'd
> vote for the latter, given that we explain that it's the source /
> destination register number where they're defined.

Will change.

>> +
>> +Where,
>
> IMO, we can probably remove this "Where,". I think it's pretty clear
> that the following terms are referring to the code block above. Wdyt?

I added the "Where," to make the reading a little bit more fluid, but I
don't have a strong opinion on that.

>
>>  
>>  **imm**
>>    signed integer immediate value
>> @@ -64,16 +62,17 @@ imm            offset   src_reg  dst_reg  opcode
>>  **opcode**
>>    operation to perform
>>  
>> -and as follows for a big-endian processor:
>> +Note that the contents of multi-byte fields ('imm' and 'offset') are
>> +stored using big-endian byte ordering in big-endian BPF and
>> +little-endian byte ordering in little-endian BPF.
>>  
>> -=============  =======  =======  =======  ============
>> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
>> -=============  =======  =======  =======  ============
>> -imm            offset   dst_reg  src_reg  opcode
>> -=============  =======  =======  =======  ============
>> +For example::
>>  
>> -Multi-byte fields ('imm' and 'offset') are similarly stored in
>> -the byte order of the processor.
>> +  opcode         offset imm          assembly
>> +         src dst
>> +  07     0   1   00 00  44 33 22 11  r1 += 0x11223344 // little
>> +         dst src
>> +  07     1   0   00 00  11 22 33 44  r1 += 0x11223344 // big
>>  
>>  Note that most instructions do not use all of the fields.
>>  Unused fields shall be cleared to zero.
>> @@ -84,18 +83,19 @@ The 64 bits following the basic instruction contain a pseudo instruction
>>  using the same format but with opcode, dst_reg, src_reg, and offset all set to zero,
>>  and imm containing the high 32 bits of the immediate value.
>>  
>> -=================  ==================
>> -64 bits (MSB)      64 bits (LSB)
>> -=================  ==================
>> -basic instruction  pseudo instruction
>> -=================  ==================
>> +This is depicted in the following figure::
>> +
>> +  basic_instruction               pseudo instruction
>> +  ------------------------------- ------------------
>> +  code:8 regs:16 offset:16 imm:32 unused:32 imm:32
>
> Don't want to bikeshed too much, but this seems a bit hard to read.  It
> kind of all looks like one line, though perhaps that was the intention.
> Wdyt about this?
>
> This is depicted in the following figure::
>
>   code:8 regs:16 offset:16 imm:32  // MSB: basic instruction
>   reserved:32              imm:32  // LSB: pseudo instruction

Hmm, yes, that was the intention, to depict the fact that a 128-bit
instruction is composed by the sequence of a basic instruction followed
by a pseudo instruction...

This would be the bombastic ASCII-art version of what I had in mind :)

         basic_instruction        
   .-----------------------------.
   |                             |
   code:8 regs:16 offset:16 imm:32 unused:32 imm:32
                                   |              |
                                   '--------------'
                                  pseudo instruction

I don't think MSB and LSB marks are meaningful in this context.  Bytes
in a file or in memory are no more or less significative: they just have
addresses (or file offsets) which can be lower or higher than the
addresses (or file offsets) of other bytes.

>>  
>>  Thus the 64-bit immediate value is constructed as follows:
>>  
>>    imm64 = (next_imm << 32) | imm
>>  
>>  where 'next_imm' refers to the imm value of the pseudo instruction
>> -following the basic instruction.
>> +following the basic instruction.  The unused bytes in the pseudo
>> +instruction shall be cleared to zero.
>
> Also apologies if this is also a bikeshed and has already been
> discussed, but should we say, "The unused bits in the pseudo instruction
> are reserved" rather than saying they should be cleared to zero?
> Implemenations should interpret "reserved" to mean that the bits should
> be zeroed, no? Or at least that seems like the norm in technical
> manuals.  For example, reserved bits in control registers on x86 should
> be cleared to avoid unexpected side effects if and when those bits are
> eventually actually used for something in future releases.

That is a good point.

I agree that "reserved" almost always means zero, but I think it is very
common to say so explicitly.  A couple of examples:

Example 1, from the x86 MBR spec:

   *  Offset  Size (bytes)  Description
   *
   *  0x000   440           MBR Bootstrap (flat binary executable code)
   *  0x1B8   4             Optional "Unique Disk ID / Signature"
   *  0x1BC   2             Optional, reserved 0x0000
   *  0x1BE   16            First partition table entry
   *  0x1CE   16            Second partition table entry
   *  0x1DE   16            Third partition table entry
   *  0x1EE   16            Fourth partition table entry
   *  0x1FE   2             (0x55, 0xAA) "Valid bootsector" signature bytes

Example 2, this is how the PE Object File specification defines a
reserved field:

   reserved,         A description of a field that indicates that the value
   must be 0         of the field must be zero for generators and consumers
                     must ignore the field.

What about this: "The unused bits in the pseudo instruction are reserved
and shall be cleared to zero."