Re: [Bpf] [PATCH v3 1/2] bpf, docs: Formalize type notation and function semantics in ISA standard

David Vernet <void@xxxxxxxxxxxxx> · Sat, 5 Aug 2023 09:14:27 -0500

On Fri, Aug 04, 2023 at 11:09:18PM -0400, Will Hawkins wrote:
> Give a single place where the shorthand for types are defined, the
> semantics of helper functions are described, and certain terms can
> be defined.
> 
> Signed-off-by: Will Hawkins <hawkinsw@xxxxxx>

Hi Will,

This is looking great. Left a couple more comments below, let me know
what you think.

> ---
>  .../bpf/standardization/instruction-set.rst   | 79 +++++++++++++++++--
>  1 file changed, 71 insertions(+), 8 deletions(-)
> 
>  Changelog:
>    v1 -> v2:
> 	   - Remove change to Sphinx version
> 		 - Address David's comments
> 		 - Address Dave's comments
>    v2 -> v3:
> 	   - Add information about sign extending
> 
> diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst
> index 655494ac7af6..fe296f35e5a7 100644
> --- a/Documentation/bpf/standardization/instruction-set.rst
> +++ b/Documentation/bpf/standardization/instruction-set.rst
> @@ -10,9 +10,68 @@ This document specifies version 1.0 of the eBPF instruction set.
>  Documentation conventions
>  =========================
>  
> -For brevity, this document uses the type notion "u64", "u32", etc.
> -to mean an unsigned integer whose width is the specified number of bits,
> -and "s32", etc. to mean a signed integer of the specified number of bits.
> +For brevity and consistency, this document refers to families
> +of types using a shorthand syntax and refers to several expository,
> +mnemonic functions when describing the semantics of opcodes. The range

Hmm, I wonder if it's slightly more accurate to say that those functions
are describing the semantics of "instructions" rather than "opcodes",
given that the value in the immediate for the byte swap instructions
dictate the width. What do you think?

> +of valid values for those types and the semantics of those functions
> +are defined in the following subsections.
> +
> +Types
> +-----
> +This document refers to integer types with a notation of the form `SN`
> +that succinctly defines whether their values are signed or unsigned

Suggestion: I don't think we need the word "succinctly" here. I'm also
inclined to suggest that we avoid using the word "define" here, as the
notation itself isn't really defining the values of the types, but is
rather just a shorthand.

> +numbers and the bit widths:

What do you think about this phrasing:

This document refers to integer types with the notation `SN` to specify
a type's signedness and bit width respectively.

Feel free to disagree. My thinking here is that it might make more sense
to explain the notation as an informal shorthand rather than as a formal
defnition, as making it a formal definition might open its own can of
worms (e.g. we would probably also have to define what signedness means,
etc).

> +
> +=== =======
> +`S` Meaning
> +=== =======
> +`u` unsigned
> +`s` signed
> +=== =======
> +
> +===== =========
> +`N`   Bit width
> +===== =========
> +`8`   8 bits
> +`16`  16 bits
> +`32`  32 bits
> +`64`  64 bits
> +`128` 128 bits
> +===== =========

Is it by any chance possible to put these two tables on the same row?
Not sure if rst is up to that challenge, and if not feel free to ignore.

> +
> +For example, `u32` is a type whose valid values are all the 32-bit unsigned
> +numbers and `s16` is a types whose valid values are all the 16-bit signed
> +numbers.
> +
> +Functions
> +---------
> +* `htobe16`: Takes an unsigned 16-bit number in host-endian format and
> +  returns the equivalent number as an unsigned 16-bit number in big-endian
> +  format.
> +* `htobe32`: Takes an unsigned 32-bit number in host-endian format and
> +  returns the equivalent number as an unsigned 32-bit number in big-endian
> +  format.
> +* `htobe64`: Takes an unsigned 64-bit number in host-endian format and
> +  returns the equivalent number as an unsigned 64-bit number in big-endian
> +  format.
> +* `htole16`: Takes an unsigned 16-bit number in host-endian format and
> +  returns the equivalent number as an unsigned 16-bit number in little-endian
> +  format.
> +* `htole32`: Takes an unsigned 32-bit number in host-endian format and
> +  returns the equivalent number as an unsigned 32-bit number in little-endian
> +  format.
> +* `htole64`: Takes an unsigned 64-bit number in host-endian format and
> +  returns the equivalent number as an unsigned 64-bit number in little-endian
> +  format.
> +* `bswap16`: Takes an unsigned 16-bit number in either big- or little-endian
> +  format and returns the equivalent number with the same bit width but
> +  opposite endianness.
> +* `bswap32`: Takes an unsigned 32-bit number in either big- or little-endian
> +  format and returns the equivalent number with the same bit width but
> +  opposite endianness.
> +* `bswap64`: Takes an unsigned 64-bit number in either big- or little-endian
> +  format and returns the equivalent number with the same bit width but
> +  opposite endianness.

Upon further reflection, maybe this belongs in the Byte swap
instructions section of the document? The types make sense to list above
because they're ubiquitous throughout the doc, but these are really 100%
specific to byte swap.

>  
>  Registers and calling convention
>  ================================
> @@ -252,19 +311,23 @@ are supported: 16, 32 and 64.
>  
>  Examples:
>  
> -``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16 means::
> +``BPF_ALU | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means::
>  
>    dst = htole16(dst)
> +  dst = htole32(dst)
> +  dst = htole64(dst)
>  
> -``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 64 means::
> +``BPF_ALU | BPF_TO_BE | BPF_END`` with imm = 16/32/64 means::
>  
> +  dst = htobe16(dst)
> +  dst = htobe32(dst)
>    dst = htobe64(dst)
>  
>  ``BPF_ALU64 | BPF_TO_LE | BPF_END`` with imm = 16/32/64 means::
>  
> -  dst = bswap16 dst
> -  dst = bswap32 dst
> -  dst = bswap64 dst
> +  dst = bswap16(dst)
> +  dst = bswap32(dst)
> +  dst = bswap64(dst)
>  

[...]

- David