RE: [PATCH bpf-next] The original document has some inconsistency.

dthaler1968@xxxxxxxxxxxxxx · Tue, 9 Jan 2024 10:06:22 -0800

David Vernet <void@xxxxxxxxxxxxx> writes: 
> Hi Aoyang,
> 
> Thanks a lot for your contribution. I agree that we need to fix the
document
> to be consistent, though I'm afraid that I think this patch goes in the
wrong
> direction by making everything match the jump instruction class. More
below.

I disagree, and I agree with Aoyang's direction.

> nit: Could you please update the patch subject to be more self-describing.
For
> example, something like:
> 
> Use consistent numerical widths in instructions.rst encodings

I agree with that subject.

> > For example:
> > 1. 1.3.1 Arithmetic instructions use '8 bits length' encoding to
> >    express the 'code' value, e.g., BPF_ADD=0x00, BPF_SUB=0x10,
> >    BPF_MUL=0x20. However the length of the 'code' is 4 bits. On the
> >    other hand, 1.3.3 Jump instructions use '4 bits length' encoding,
> >    e.g., BPF_JEQ=0x1 and BPF_JGT=0x2.
> > 2. There are also many places that use '8 bits length' encoding to
> >    express the corresponding contents, e.g., 1.4 Load and store
> >    instructions, BPF_ABS=0x20, BPF_IND=0x40. However, the length of
> >    'mode modifier' is 3 bits.
> >
> > To summarize, the only place that has inconsistent encoding is Jump
> > instructions. After discussing with Dave, dthaler1968@xxxxxxxxxxxxxx,
> > we agree that the document should be more clear.
> >
> > Signed-off-by: Aoyang Fang <aoyangfang@xxxxxxxxxxxxxxxx>
> >
> > ---
> >  .../bpf/standardization/instruction-set.rst   | 170 +++++++++---------
> >  1 file changed, 85 insertions(+), 85 deletions(-)
> >
> > diff --git a/Documentation/bpf/standardization/instruction-set.rst
> > b/Documentation/bpf/standardization/instruction-set.rst
> > index 245b6defc..57dd1fa00 100644
> > --- a/Documentation/bpf/standardization/instruction-set.rst
> > +++ b/Documentation/bpf/standardization/instruction-set.rst
> > @@ -172,18 +172,18 @@ Instruction classes
> >
> >  The three LSB bits of the 'opcode' field store the instruction class:
> >
> > -=========  =====  ===============================
> ===================================
> > -class      value  description                      reference
> > -=========  =====  ===============================
> ===================================
> > -BPF_LD     0x00   non-standard load operations     `Load and store
> instructions`_
> > -BPF_LDX    0x01   load into register operations    `Load and store
> instructions`_
> > -BPF_ST     0x02   store from immediate operations  `Load and store
> instructions`_
> > -BPF_STX    0x03   store from register operations   `Load and store
> instructions`_
> > -BPF_ALU    0x04   32-bit arithmetic operations     `Arithmetic and jump
> instructions`_
> > -BPF_JMP    0x05   64-bit jump operations           `Arithmetic and jump
> instructions`_
> > -BPF_JMP32  0x06   32-bit jump operations           `Arithmetic and jump
> instructions`_
> > -BPF_ALU64  0x07   64-bit arithmetic operations     `Arithmetic and jump
> instructions`_
> > -=========  =====  ===============================
> > ===================================
> > +=========  =============  ===============================
> ===================================
> > +class      value(3 bits)  description                      reference
> > +=========  =============  ===============================
> ===================================
> > +BPF_LD     0x0            non-standard load operations     `Load and
store
> instructions`_
> > +BPF_LDX    0x1            load into register operations    `Load and
store
> instructions`_
> > +BPF_ST     0x2            store from immediate operations  `Load and
store
> instructions`_
> > +BPF_STX    0x3            store from register operations   `Load and
store
> instructions`_
> > +BPF_ALU    0x4            32-bit arithmetic operations     `Arithmetic
and jump
> instructions`_
> > +BPF_JMP    0x5            64-bit jump operations           `Arithmetic
and jump
> instructions`_
> > +BPF_JMP32  0x6            32-bit jump operations           `Arithmetic
and jump
> instructions`_
> > +BPF_ALU64  0x7            64-bit arithmetic operations     `Arithmetic
and jump
> instructions`_
> > +=========  =============  ===============================
> > +===================================
> 
> Hmm, I presonally think this is more confusing. The opcode field is 8
bits. We
> already specify that the value is the three LSB of the opcode field. It's
> certainly subjective, but I think we should have the value reflect the
actual
> value in the field it's embedded in. In my opinion, changing the value to
not
> reflect its place in the actual opcode in my opinion imposes a burden on
the
> reader to go back and reference where the field actually belongs in the
full
> opcode. It's a tradeoff, but I think we're already on the winning end of
that
> tradeoff.

This document is an IETF standards specification so it's worth looking at
what
typical RFC conventions are.

* RFC 791 section 3.1 defines the IPv4 header, where the Version field is
the high
   4 bits of a byte.  It defines the value as 4, not 0x40.
   It also defines the Type of Service bits which are 1 bit fields with
value 0 or 1
   (not, say 0x40).
* RFC 8200 section 3 defines the IPv6 header, where the Version field is the
high
   4 bits of a byte.  It defines the value as 6, not 0x60.

Etc.  Offhand I am not aware of any RFC that uses the convention you
suggest,
though perhaps others are?

> >  Arithmetic and jump instructions
> >  ================================
> > @@ -203,12 +203,12 @@ code            source  instruction class
> >  **source**
> >    the source operand location, which unless otherwise specified is one
of:
> >
> > -  ======  =====  ==============================================
> > -  source  value  description
> > -  ======  =====  ==============================================
> > -  BPF_K   0x00   use 32-bit 'imm' value as source operand
> > -  BPF_X   0x08   use 'src_reg' register value as source operand
> > -  ======  =====  ==============================================
> > +  ======  ============
> > + ==============================================
> > +  source  value(1 bit)  description
> > +  ======  ============
> ==============================================
> > +  BPF_K   0x0           use 32-bit 'imm' value as source operand
> > +  BPF_X   0x1           use 'src_reg' register value as source operand
> > +  ======  ============
> > + ==============================================
> 
> Same here as well. The value isn't really 0x1, it's 0x8. And 0x08 is even
more
> clear yet, given that we're representing the value of the bit in the 8 bit
opcode
> field.

Its 1, in the same sense as the TOS bits in RFC 791 are 1.

> >  **instruction class**
> >    the instruction class (see `Instruction classes`_) @@ -221,27
> > +221,27 @@ otherwise identical operations.
> >  The 'code' field encodes the operation as below, where 'src' and
> > 'dst' refer  to the values of the source and destination registers,
respectively.
> >
> > -=========  =====  =======
> ==========================================================
> > -code       value  offset   description
> > -=========  =====  =======
> ==========================================================
> > -BPF_ADD    0x00   0        dst += src
> > -BPF_SUB    0x10   0        dst -= src
> > -BPF_MUL    0x20   0        dst \*= src
> > -BPF_DIV    0x30   0        dst = (src != 0) ? (dst / src) : 0
> > -BPF_SDIV   0x30   1        dst = (src != 0) ? (dst s/ src) : 0
> > -BPF_OR     0x40   0        dst \|= src
> > -BPF_AND    0x50   0        dst &= src
> > -BPF_LSH    0x60   0        dst <<= (src & mask)
> > -BPF_RSH    0x70   0        dst >>= (src & mask)
> > -BPF_NEG    0x80   0        dst = -dst
> > -BPF_MOD    0x90   0        dst = (src != 0) ? (dst % src) : dst
> > -BPF_SMOD   0x90   1        dst = (src != 0) ? (dst s% src) : dst
> > -BPF_XOR    0xa0   0        dst ^= src
> > -BPF_MOV    0xb0   0        dst = src
> > -BPF_MOVSX  0xb0   8/16/32  dst = (s8,s16,s32)src
> > -BPF_ARSH   0xc0   0        :term:`sign extending<Sign Extend>` dst >>=
(src &
> mask)
> > -BPF_END    0xd0   0        byte swap operations (see `Byte swap
instructions`_
> below)
> >
> > -=========  =====  =======
> > ==========================================================
> > +=========  =============  =======
> ==========================================================
> > +code       value(4 bits)  offset   description
> > +=========  =============  =======
> ==========================================================
> > +BPF_ADD    0x0            0        dst += src
> > +BPF_SUB    0x1            0        dst -= src
> > +BPF_MUL    0x2            0        dst \*= src
> > +BPF_DIV    0x3            0        dst = (src != 0) ? (dst / src) : 0
> > +BPF_SDIV   0x3            1        dst = (src != 0) ? (dst s/ src) : 0
> > +BPF_OR     0x4            0        dst \|= src
> > +BPF_AND    0x5            0        dst &= src
> > +BPF_LSH    0x6            0        dst <<= (src & mask)
> > +BPF_RSH    0x7            0        dst >>= (src & mask)
> > +BPF_NEG    0x8            0        dst = -dst
> > +BPF_MOD    0x9            0        dst = (src != 0) ? (dst % src) : dst
> > +BPF_SMOD   0x9            1        dst = (src != 0) ? (dst s% src) :
dst
> > +BPF_XOR    0xa            0        dst ^= src
> > +BPF_MOV    0xb            0        dst = src
> > +BPF_MOVSX  0xb            8/16/32  dst = (s8,s16,s32)src
> > +BPF_ARSH   0xc            0        :term:`sign extending<Sign Extend>`
dst >>=
> (src & mask)
> > +BPF_END    0xd            0        byte swap operations (see `Byte swap
> instructions`_ below)
> > +=========  =============  =======
> > +==========================================================
> 
> Same here.
> 
> >  Underflow and overflow are allowed during arithmetic operations,
> > meaning  the 64-bit or 32-bit value will wrap. If BPF program
> > execution would @@ -314,13 +314,13 @@ select what byte order the
> > operation converts from or to. For  ``BPF_ALU64``, the 1-bit source
> > operand field in the opcode is reserved  and must be set to 0.
> >
> > -=========  =========  =====
> =================================================
> > -class      source     value  description
> > -=========  =========  =====
> =================================================
> > -BPF_ALU    BPF_TO_LE  0x00   convert between host byte order and little
> endian
> > -BPF_ALU    BPF_TO_BE  0x08   convert between host byte order and big
> endian
> > -BPF_ALU64  Reserved   0x00   do byte swap unconditionally
> > -=========  =========  =====
> > =================================================
> > +=========  =========  ============
> =================================================
> > +class      source     value(1 bit)  description
> > +=========  =========  ============
> =================================================
> > +BPF_ALU    BPF_TO_LE  0x0           convert between host byte order and
little
> endian
> > +BPF_ALU    BPF_TO_BE  0x1           convert between host byte order and
big
> endian
> > +BPF_ALU64  Reserved   0x0           do byte swap unconditionally
> > +=========  =========  ============
> > +=================================================
> 
> Same here. Which bit does the 0x1 actually correspond to? It's
self-evident in
> the former, not the latter.

Would you then say that RFC 791 (and many RFCs since) is not self-evident?

If the WG chooses to diverge from the most common ways the IETF defines
bit formats, that might be ok but may need a section explaining the
divergent convention.   My personal preference though is to stay consistent
with the normal IETF convention, which part of the ISA doc already did.

Dave