Re: [Bpf] [PATCH v3 1/2] bpf, docs: Formalize type notation and function semantics in ISA standard

Will Hawkins <hawkinsw@xxxxxx> · Mon, 7 Aug 2023 10:01:44 -0400

On Sat, Aug 5, 2023 at 11:01 PM Will Hawkins <hawkinsw@xxxxxx> wrote:
>
> On Sat, Aug 5, 2023 at 10:14 AM David Vernet <void@xxxxxxxxxxxxx> wrote:
> >
> > On Fri, Aug 04, 2023 at 11:09:18PM -0400, Will Hawkins wrote:
> > > Give a single place where the shorthand for types are defined, the
> > > semantics of helper functions are described, and certain terms can
> > > be defined.
> > >
> > > Signed-off-by: Will Hawkins <hawkinsw@xxxxxx>
> >
> > Hi Will,
> >
> > This is looking great. Left a couple more comments below, let me know
> > what you think.
>
> Thanks for the feedback!
>
> >
> > > ---
> > >  .../bpf/standardization/instruction-set.rst   | 79 +++++++++++++++++--
> > >  1 file changed, 71 insertions(+), 8 deletions(-)
> > >
> > >  Changelog:
> > >    v1 -> v2:
> > >          - Remove change to Sphinx version
> > >                - Address David's comments
> > >                - Address Dave's comments
> > >    v2 -> v3:
> > >          - Add information about sign extending
> > >
> > > diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
> > > index 655494ac7af6..fe296f35e5a7 100644
> > > --- a/Documentation/bpf/standardization/instruction-set.rst
> > > +++ b/Documentation/bpf/standardization/instruction-set.rst
> > > @@ -10,9 +10,68 @@ This document specifies version 1.0 of the eBPF instruction set.
> > >  Documentation conventions
> > >  =========================
> > >
> > > -For brevity, this document uses the type notion "u64", "u32", etc.
> > > -to mean an unsigned integer whose width is the specified number of bits,
> > > -and "s32", etc. to mean a signed integer of the specified number of bits.
> > > +For brevity and consistency, this document refers to families
> > > +of types using a shorthand syntax and refers to several expository,
> > > +mnemonic functions when describing the semantics of opcodes. The range
> >
> > Hmm, I wonder if it's slightly more accurate to say that those functions
> > are describing the semantics of "instructions" rather than "opcodes",
> > given that the value in the immediate for the byte swap instructions
> > dictate the width. What do you think?
>
> Great point!
>
> >
> > > +of valid values for those types and the semantics of those functions
> > > +are defined in the following subsections.
> > > +
> > > +Types
> > > +-----
> > > +This document refers to integer types with a notation of the form `SN`
> > > +that succinctly defines whether their values are signed or unsigned
> >
> > Suggestion: I don't think we need the word "succinctly" here. I'm also
> > inclined to suggest that we avoid using the word "define" here, as the
> > notation itself isn't really defining the values of the types, but is
> > rather just a shorthand.
>
> Yes!
>
> >
> > > +numbers and the bit widths:
> >
> > What do you think about this phrasing:
> >
> > This document refers to integer types with the notation `SN` to specify
> > a type's signedness and bit width respectively.
> >
> > Feel free to disagree. My thinking here is that it might make more sense
> > to explain the notation as an informal shorthand rather than as a formal
> > defnition, as making it a formal definition might open its own can of
> > worms (e.g. we would probably also have to define what signedness means,
> > etc).
>
> I think that you make an excellent point. We have already gone
> back/forth about whether there is going to be a definition for the
> "type" of signedness that we use in the representation (i.e., two's
> complement, one's complement, sign magnitude, etc) and we definitely
> don't want to rush in to that rabbit hole again. I will incorporate
> your suggestion in the next revision.
>
> >
> > > +
> > > +=== =======
> > > +`S` Meaning
> > > +=== =======
> > > +`u` unsigned
> > > +`s` signed
> > > +=== =======
> > > +
> > > +===== =========
> > > +`N`   Bit width
> > > +===== =========
> > > +`8`   8 bits
> > > +`16`  16 bits
> > > +`32`  32 bits
> > > +`64`  64 bits
> > > +`128` 128 bits
> > > +===== =========
> >
> > Is it by any chance possible to put these two tables on the same row?
> > Not sure if rst is up to that challenge, and if not feel free to ignore.
>
> I would love to make that happen. My rst skills are developing and I
> will see what I can do!

FYI: In the upcoming version of this patch, you will see that I did
not do the side-by-side table. I want you to know that I tried very
hard and ultimately got it to work. However, it looked terrible and
the result was confusing (I thought). If you would strongly prefer a
side-by-side (ish) table, please let me know and I will send a patch
to show what it looked like.

I just wanted to note that I *tried* to incorporate your suggestion,
but could not ultimately bend RST to my will (yes, pun intended)!

Will

>
> >
> > > +
> > > +For example, `u32` is a type whose valid values are all the 32-bit unsigned
> > > +numbers and `s16` is a types whose valid values are all the 16-bit signed
> > > +numbers.
> > > +
> > > +Functions
> > > +---------
> > > +* `htobe16`: Takes an unsigned 16-bit number in host-endian format and
> > > +  returns the equivalent number as an unsigned 16-bit number in big-endian
> > > +  format.
> > > +* `htobe32`: Takes an unsigned 32-bit number in host-endian format and
> > > +  returns the equivalent number as an unsigned 32-bit number in big-endian
> > > +  format.
> > > +* `htobe64`: Takes an unsigned 64-bit number in host-endian format and
> > > +  returns the equivalent number as an unsigned 64-bit number in big-endian
> > > +  format.
> > > +* `htole16`: Takes an unsigned 16-bit number in host-endian format and
> > > +  returns the equivalent number as an unsigned 16-bit number in little-endian
> > > +  format.
> > > +* `htole32`: Takes an unsigned 32-bit number in host-endian format and
> > > +  returns the equivalent number as an unsigned 32-bit number in little-endian
> > > +  format.
> > > +* `htole64`: Takes an unsigned 64-bit number in host-endian format and
> > > +  returns the equivalent number as an unsigned 64-bit number in little-endian
> > > +  format.
> > > +* `bswap16`: Takes an unsigned 16-bit number in either big- or little-endian
> > > +  format and returns the equivalent number with the same bit width but
> > > +  opposite endianness.
> > > +* `bswap32`: Takes an unsigned 32-bit number in either big- or little-endian
> > > +  format and returns the equivalent number with the same bit width but
> > > +  opposite endianness.
> > > +* `bswap64`: Takes an unsigned 64-bit number in either big- or little-endian
> > > +  format and returns the equivalent number with the same bit width but
> > > +  opposite endianness.
> >
> > Upon further reflection, maybe this belongs in the Byte swap
> > instructions section of the document? The types make sense to list above
> > because they're ubiquitous throughout the doc, but these are really 100%
> > specific to byte swap.
>
> I am not opposed to that. The only reason that I put them here was to
> make it fully centralized. They are also going to be used in the
> Appendix and I was hoping not to have to reproduce them there. In v4
> of the patch I will leave them here and then if we do feel like we
> should move them I will happily make a v5!
>
> Thank you again for the feedback! A new revision of the patch will be
> out shortly!
>
> Have a great rest of the weekend!
> Will
>
> >
> > >
> > >  Registers and calling convention
> > >  ================================
> > > @@ -252,19 +311,23 @@ are supported: 16, 32 and 64.
> > >
> > >  Examples:
> > >
> > > -``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
> > > +``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means::
> > >
> > >    dst = htole16(dst)
> > > +  dst = htole32(dst)
> > > +  dst = htole64(dst)
> > >
> > > -``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
> > > +``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 16/32/64 means::
> > >
> > > +  dst = htobe16(dst)
> > > +  dst = htobe32(dst)
> > >    dst = htobe64(dst)
> > >
> > >  ``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means::
> > >
> > > -  dst = bswap16 dst
> > > -  dst = bswap32 dst
> > > -  dst = bswap64 dst
> > > +  dst = bswap16(dst)
> > > +  dst = bswap32(dst)
> > > +  dst = bswap64(dst)
> > >
> >
> > [...]
> >
> > - David