Re: [Bpf] [PATCH bpf-next v2] bpf, docs: Add explanation of endianness

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 22 Feb 2023 14:10:49 -0800



On Mon, Feb 20, 2023 at 2:37 PM Dave Thaler
<dthaler1968=40googlemail.com@xxxxxxxxxxxxxx> wrote:
>
> From: Dave Thaler <dthaler@xxxxxxxxxxxxx>
>
> Document the discussion from the email thread on the IETF bpf list,
> where it was explained that the raw format varies by endianness
> of the processor.
>
> Signed-off-by: Dave Thaler <dthaler@xxxxxxxxxxxxx>
>
> Acked-by: David Vernet <void@xxxxxxxxxxxxx>
> ---
>
> V1 -> V2: rebased on top of latest master
> ---
>  Documentation/bpf/instruction-set.rst | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/bpf/instruction-set.rst b/Documentation/bpf/instruction-set.rst
> index af515de5fc3..1d473f060fa 100644
> --- a/Documentation/bpf/instruction-set.rst
> +++ b/Documentation/bpf/instruction-set.rst
> @@ -38,8 +38,9 @@ eBPF has two instruction encodings:
>  * the wide instruction encoding, which appends a second 64-bit immediate (i.e.,
>    constant) value after the basic instruction for a total of 128 bits.
>
> -The basic instruction encoding is as follows, where MSB and LSB mean the most significant
> -bits and least significant bits, respectively:
> +The basic instruction encoding looks as follows for a little-endian processor,
> +where MSB and LSB mean the most significant bits and least significant bits,
> +respectively:
>
>  =============  =======  =======  =======  ============
>  32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> @@ -63,6 +64,17 @@ imm            offset   src_reg  dst_reg  opcode
>  **opcode**
>    operation to perform
>
> +and as follows for a big-endian processor:
> +
> +=============  =======  ====================  ===============  ============
> +32 bits (MSB)  16 bits  4 bits                4 bits           8 bits (LSB)
> +=============  =======  ====================  ===============  ============
> +immediate      offset   destination register  source register  opcode
> +=============  =======  ====================  ===============  ============

I've changed it to:
imm            offset   dst_reg  src_reg  opcode

to match the little endian table,
but now one of the tables feels wrong.
The encoding is always done by applying C standard to the struct:
struct bpf_insn {
        __u8    code;           /* opcode */
        __u8    dst_reg:4;      /* dest register */
        __u8    src_reg:4;      /* source register */
        __s16   off;            /* signed offset */
        __s32   imm;            /* signed immediate constant */
};
I'm not sure how to express this clearly in the table.