Re: [Bpf] [PATCH] bpf, docs: Document BPF insn encoding in term of stored bytes

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 24 Feb 2023 16:20:27 -0800

On Fri, Feb 24, 2023 at 12:59 PM Jose E. Marchesi
<jose.marchesi@xxxxxxxxxx> wrote:
>
>
> >> -----Original Message-----
> >> From: Bpf <bpf-bounces@xxxxxxxx> On Behalf Of Jose E. Marchesi
> >> Sent: Friday, February 24, 2023 12:04 PM
> >> To: bpf <bpf@xxxxxxxxxxxxxxx>
> >> Cc: Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx>; bpf@xxxxxxxx
> >> Subject: [Bpf] [PATCH] bpf, docs: Document BPF insn encoding in term of
> >> stored bytes
> >>
> >>
> >> This patch modifies instruction-set.rst so it documents the encoding of BPF
> >> instructions in terms of how the bytes are stored (be it in an ELF file or as
> >> bytes in a memory buffer to be loaded into the kernel or some other BPF
> >> consumer) as opposed to how the instruction looks like once loaded.
> >>
> >> This is hopefully easier to understand by implementors looking to generate
> >> and/or consume bytes conforming BPF instructions.
> >>
> >> The patch also clarifies that the unused bytes in a pseudo-instruction shall be
> >> cleared with zeros.
> >>
> >> Signed-off-by: Jose E. Marchesi <jose.marchesi@xxxxxxxxxx>
> >> ---
> >>  Documentation/bpf/instruction-set.rst | 43 +++++++++++++--------------
> >>  1 file changed, 21 insertions(+), 22 deletions(-)
> >>
> >> diff --git a/Documentation/bpf/instruction-set.rst
> >> b/Documentation/bpf/instruction-set.rst
> >> index 01802ed9b29b..9b28c0e15bb6 100644
> >> --- a/Documentation/bpf/instruction-set.rst
> >> +++ b/Documentation/bpf/instruction-set.rst
> >> @@ -38,15 +38,13 @@ eBPF has two instruction encodings:
> >>  * the wide instruction encoding, which appends a second 64-bit immediate
> >> (i.e.,
> >>    constant) value after the basic instruction for a total of 128 bits.
> >>
> >> -The basic instruction encoding looks as follows for a little-endian processor,
> >> -where MSB and LSB mean the most significant bits and least significant bits,
> >> -respectively:
> >> +The fields conforming an encoded basic instruction are stored in the
> >> +following order:
> >>
> >> -=============  =======  =======  =======  ============
> >> -32 bits (MSB)  16 bits  4 bits   4 bits   8 bits (LSB)
> >> -=============  =======  =======  =======  ============
> >> -imm            offset   src_reg  dst_reg  opcode
> >> -=============  =======  =======  =======  ============
> >> +  opcode:8 src:4 dst:4 offset:16 imm:32 // In little-endian BPF.
> >> +  opcode:8 dst:4 src:4 offset:16 imm:32 // In big-endian BPF.
> >
> > Personally I find this notation harder to understand in general.
> > For example, it encodes (without explanation) the C language
> > assumption that "//" is a comment, ":" indicates a bit width,
> > and the fields are in order from most significate byte to least
> > significant byte.  The text before this change has no such
> > unexplained assumptions.
>
> The fields are not ordered from "most significative byte" to "least
> significative byte".  The fields are ordered as they are stored.  Thats
> the whole point of the patch.
>
> As for //, :N and | below, I think these signs are obvious enough to not
> require further explanation, but I wouldn't mind to use some other
> better notation, if you can suggest one. I am not a very graphical
> person myself.

We're using plenty of C lingvo in this doc including:
dst = dst ^ imm32
and
dst = (u32) ((u32) dst + (u32) src)

So I think '//' is fine to have without extra verbosity.

imo the patch is a nice improvement in readability.
The ldimm64 part is especially nice.