Re: [Bpf] [PATCH v3 1/2] bpf, docs: Formalize type notation and function semantics in ISA standard

Will Hawkins <hawkinsw@xxxxxx> · Sat, 5 Aug 2023 23:01:57 -0400

On Sat, Aug 5, 2023 at 10:14 AM David Vernet <void@xxxxxxxxxxxxx> wrote:
>
> On Fri, Aug 04, 2023 at 11:09:18PM -0400, Will Hawkins wrote:
> > Give a single place where the shorthand for types are defined, the
> > semantics of helper functions are described, and certain terms can
> > be defined.
> >
> > Signed-off-by: Will Hawkins <hawkinsw@xxxxxx>
>
> Hi Will,
>
> This is looking great. Left a couple more comments below, let me know
> what you think.

Thanks for the feedback!

>
> > ---
> >  .../bpf/standardization/instruction-set.rst   | 79 +++++++++++++++++--
> >  1 file changed, 71 insertions(+), 8 deletions(-)
> >
> >  Changelog:
> >    v1 -> v2:
> >          - Remove change to Sphinx version
> >                - Address David's comments
> >                - Address Dave's comments
> >    v2 -> v3:
> >          - Add information about sign extending
> >
> > diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
> > index 655494ac7af6..fe296f35e5a7 100644
> > --- a/Documentation/bpf/standardization/instruction-set.rst
> > +++ b/Documentation/bpf/standardization/instruction-set.rst
> > @@ -10,9 +10,68 @@ This document specifies version 1.0 of the eBPF instruction set.
> >  Documentation conventions
> >  =========================
> >
> > -For brevity, this document uses the type notion "u64", "u32", etc.
> > -to mean an unsigned integer whose width is the specified number of bits,
> > -and "s32", etc. to mean a signed integer of the specified number of bits.
> > +For brevity and consistency, this document refers to families
> > +of types using a shorthand syntax and refers to several expository,
> > +mnemonic functions when describing the semantics of opcodes. The range
>
> Hmm, I wonder if it's slightly more accurate to say that those functions
> are describing the semantics of "instructions" rather than "opcodes",
> given that the value in the immediate for the byte swap instructions
> dictate the width. What do you think?

Great point!

>
> > +of valid values for those types and the semantics of those functions
> > +are defined in the following subsections.
> > +
> > +Types
> > +-----
> > +This document refers to integer types with a notation of the form `SN`
> > +that succinctly defines whether their values are signed or unsigned
>
> Suggestion: I don't think we need the word "succinctly" here. I'm also
> inclined to suggest that we avoid using the word "define" here, as the
> notation itself isn't really defining the values of the types, but is
> rather just a shorthand.

Yes!

>
> > +numbers and the bit widths:
>
> What do you think about this phrasing:
>
> This document refers to integer types with the notation `SN` to specify
> a type's signedness and bit width respectively.
>
> Feel free to disagree. My thinking here is that it might make more sense
> to explain the notation as an informal shorthand rather than as a formal
> defnition, as making it a formal definition might open its own can of
> worms (e.g. we would probably also have to define what signedness means,
> etc).

I think that you make an excellent point. We have already gone
back/forth about whether there is going to be a definition for the
"type" of signedness that we use in the representation (i.e., two's
complement, one's complement, sign magnitude, etc) and we definitely
don't want to rush in to that rabbit hole again. I will incorporate
your suggestion in the next revision.

>
> > +
> > +=== =======
> > +`S` Meaning
> > +=== =======
> > +`u` unsigned
> > +`s` signed
> > +=== =======
> > +
> > +===== =========
> > +`N`   Bit width
> > +===== =========
> > +`8`   8 bits
> > +`16`  16 bits
> > +`32`  32 bits
> > +`64`  64 bits
> > +`128` 128 bits
> > +===== =========
>
> Is it by any chance possible to put these two tables on the same row?
> Not sure if rst is up to that challenge, and if not feel free to ignore.

I would love to make that happen. My rst skills are developing and I
will see what I can do!

>
> > +
> > +For example, `u32` is a type whose valid values are all the 32-bit unsigned
> > +numbers and `s16` is a types whose valid values are all the 16-bit signed
> > +numbers.
> > +
> > +Functions
> > +---------
> > +* `htobe16`: Takes an unsigned 16-bit number in host-endian format and
> > +  returns the equivalent number as an unsigned 16-bit number in big-endian
> > +  format.
> > +* `htobe32`: Takes an unsigned 32-bit number in host-endian format and
> > +  returns the equivalent number as an unsigned 32-bit number in big-endian
> > +  format.
> > +* `htobe64`: Takes an unsigned 64-bit number in host-endian format and
> > +  returns the equivalent number as an unsigned 64-bit number in big-endian
> > +  format.
> > +* `htole16`: Takes an unsigned 16-bit number in host-endian format and
> > +  returns the equivalent number as an unsigned 16-bit number in little-endian
> > +  format.
> > +* `htole32`: Takes an unsigned 32-bit number in host-endian format and
> > +  returns the equivalent number as an unsigned 32-bit number in little-endian
> > +  format.
> > +* `htole64`: Takes an unsigned 64-bit number in host-endian format and
> > +  returns the equivalent number as an unsigned 64-bit number in little-endian
> > +  format.
> > +* `bswap16`: Takes an unsigned 16-bit number in either big- or little-endian
> > +  format and returns the equivalent number with the same bit width but
> > +  opposite endianness.
> > +* `bswap32`: Takes an unsigned 32-bit number in either big- or little-endian
> > +  format and returns the equivalent number with the same bit width but
> > +  opposite endianness.
> > +* `bswap64`: Takes an unsigned 64-bit number in either big- or little-endian
> > +  format and returns the equivalent number with the same bit width but
> > +  opposite endianness.
>
> Upon further reflection, maybe this belongs in the Byte swap
> instructions section of the document? The types make sense to list above
> because they're ubiquitous throughout the doc, but these are really 100%
> specific to byte swap.

I am not opposed to that. The only reason that I put them here was to
make it fully centralized. They are also going to be used in the
Appendix and I was hoping not to have to reproduce them there. In v4
of the patch I will leave them here and then if we do feel like we
should move them I will happily make a v5!

Thank you again for the feedback! A new revision of the patch will be
out shortly!

Have a great rest of the weekend!
Will

>
> >
> >  Registers and calling convention
> >  ================================
> > @@ -252,19 +311,23 @@ are supported: 16, 32 and 64.
> >
> >  Examples:
> >
> > -``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
> > +``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means::
> >
> >    dst = htole16(dst)
> > +  dst = htole32(dst)
> > +  dst = htole64(dst)
> >
> > -``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
> > +``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 16/32/64 means::
> >
> > +  dst = htobe16(dst)
> > +  dst = htobe32(dst)
> >    dst = htobe64(dst)
> >
> >  ``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means::
> >
> > -  dst = bswap16 dst
> > -  dst = bswap32 dst
> > -  dst = bswap64 dst
> > +  dst = bswap16(dst)
> > +  dst = bswap32(dst)
> > +  dst = bswap64(dst)
> >
>
> [...]
>
> - David