Re: GCC and binutils support for BPF V4 instructions

Yonghong Song <yonghong.song@xxxxxxxxx> · Sun, 30 Jul 2023 09:12:05 -0700

On 7/29/23 9:54 PM, Jose E. Marchesi wrote:

On Sat, Jul 29, 2023 at 1:29 AM Jose E. Marchesi
<jose.marchesi@xxxxxxxxxx> wrote:

On Fri, Jul 28, 2023 at 11:01 AM Jose E. Marchesi
<jose.marchesi@xxxxxxxxxx> wrote:

On 7/28/23 9:41 AM, Jose E. Marchesi wrote:
Hello.
Just a heads up regarding the new BPF V4 instructions and their
support
in the GNU Toolchain.
V4 sdiv/smod instructions
    Binutils has been updated to use the V4 encoding of these
    instructions, which used to be part of the xbpf testing dialect used
    in GCC.  GCC generates these instructions for signed division when
    -mcpu=v4 or higher.
V4 sign-extending register move instructions
V4 signed load instructions
V4 byte swap instructions
    Supported in assembler, disassembler and linker.  GCC generates
these
    instructions when -mcpu=v4 or higher.
V4 32-bit unconditional jump instruction
    Supported in assembler and disassembler.  GCC doesn't generate
that
    instruction.
    However, the assembler has been expanded in order to perform the
    following relaxations when the disp16 field of a jump instruction is
    known at assembly time, and is overflown, unless -mno-relax is
    specified:
      JA disp16  -> JAL disp32
      Jxx disp16 -> Jxx +1; JA +1; JAL disp32
    Where Jxx is one of the conditional jump instructions such as
jeq,
    jlt, etc.

Sounds great. The above 'JA/Jxx disp16' transformation matches
what llvm did as well.

Not by chance ;)

Now what is pending in binutils is to relax these jumps in the linker as
well.  But it is very low priority, compared to get these kernel
selftests building and running.  So it will happen, but probably not
anytime soon.

By the way, for doing things like that (further object transformations
by linkers and the like) we will need to have the ELF files annotated
with:

- The BPF cpu version the object was compiled for: v1, v2, v3, v4, and

- Individual flags specifying the BPF cpu capabilities (alu32, bswap,
   jmp32, etc) required/expected by the code in the object.

Note it is interesting to being able to denote both, for flexibility.

There are 32 bits available for machine-specific flags in e_flags, which
are commonly used for this purpose by other arches.  For BPF I would
suggest something like:

#define EF_BPF_ALU32  0x00000001
#define EF_BPF_JMP32  0x00000002
#define EF_BPF_BSWAP  0x00000004
#define EF_BPF_SDIV   0x00000008
#define EF_BPF_CPUVER 0x00FF0000

Interesting idea. I don't mind, but what are we going to do with this info?
I cannot think of anything useful libbpf could do with it.
For other archs such flags make sense, since disasm of everything
to discover properties is hard. For BPF we will parse all insns anyway,
so additional info in ELF doesn't give any additional insight.

I mainly had link-time relaxation in mind.  The linker needs to know
what instructions are available (JMP32 or not) in order to decide what
to relax, and to what.

But the assembler has little choice when the jump target is >16bits.
It can use jmp32 or error.

When the assembler sees a jump instruction:

    goto EXPR

there are several possibilities:

1. EXPR consists on a literal number like 1, -10 or 0xff, or an
    expression that can be resolved during the first assembler pass (like
    8 * 64).  The numerical result is interpreted as number of 64-bit
    words minus one.  In this case, the assembler can immediately decide
    whether the operand is >16 bits, relaxing to the jmp32 jump if cpu >=
    v4 and unless -mno-relax is passed in the command line.

2. EXPR is a symbolic expression involving a symbol that can be resolved
    during the second assembler pass.  For example, `foo + 10'.  In this
    case, there are two possibilities:

    2.1. The symbol is an absolute symbol.  In this case the value is
         interpreted as-such and no conversion is done by the assembler.
         So if for example the user invokes the assembler passing
         `--defsym foo=10', the assembled instruction is `ja 20'.

    2.2. The symbol is a PC-relative or section-relative symbol.  In this
         case the value is interpreted as a byte offset (the assembler
         takes care to transform offsets relative to the current section
         into PC-relative offsets whenever necessary).  This is the case
         of labels.  For these symbols, the BPF assembler converts the
         value from bytes to number of 64-bit words minus one.  So for
         example for `ja done' where `done' has the value 256 bytes, the
         assembled instruction is `ja 31'.

3. EXPR is a symbolic expression involving a symbol that cannot be
    resolved during the second assembler pass.  In this case, a
    relocation for the 16-bit immediate field in the instruction is
    generated in the assembled object.  There is no R_BPF_64_16
    relocation defined by BPF as of yet, so we are using
    R_BPF_GNU_64_16=256, which as we agreed uses a high relocation number
    to avoid collisions.  Since gas is a standalone assembler, it seems
    sensible to emit a relocation rather than erroing out in these
    situations.  ld knows how to handle these relocs when linking BPF
    objects together.

I guess you're proposing to encode this e_flags in the text of asm ?
Special asm directive that will force asm to error or use jmp32?

GAS uses command-line options for that.

When GCC is invoked with -mcpu=v3, for example, it passes the
corresponding option to the assembler so it expects a BPF V3 assembly
program. In that scenario, if the user does a jump to an address that is
16bit in an inline asm, the assembler will error out,
because relaxing to jmp32 is not a possibility in V3.  Ditto for
compiler options like -msdiv or -mjmp32, that both clang and GCC
support.

I don't know how clang configures its integrated assembler... I guess by
calling some function.  But it is the same principle: if you tell clang
to generate v3 bpf and you include a header that uses a v4 instruction
(or overflown jump that would require relaxation) in inline asm, you
want an error.

If -mcpu=<version> is specified in the clang command line,
then the cpu <version> will be encoded in IR and will be
passed to the integrated assembler. And if you specify
-mcpu=v3 in the command line and your code has
cpu v4 inline assembly code, the compiler will error out.

Also as you mention the disassembler can look in the object to determine
which instructions shall be recognized and with insructions shall be
reported as <unknown>.  Right now it is necessary to pass an explicit
option to the assembler, and the default is v4.

Disambiguating between unknown and exact insn kinda makes sense for disasm.
For assembler it's kinda weird. If text says 'sdiv' the asm should emit
binary code for it regardless of asm directive.

Unless configured to not do so?  See above.

It seems e_flags can only be emitted by assembler.
Like if it needs to use jmp32 it will add EF_BPF_JMP32.

Yep.

Still feels that we can live without these flags, but not a bad
addition.

The individual flags... I am not sure, other arches have them, but maybe
having them in BPF doesn't make much sense and it is not worth the extra
complication and wasted bits in e_flags.  How realistic is to expect
that some kernel may support a particular version of the BPF ISA, and
also have support for some particular instruction from a later ISA as
the result of a backport or something?  Not for me to judge... I was
already bitten by my utter ignorance on kernel business when I added
that silly useless -mkernel=VERSION option to GCC 8-)

What I am pretty sure is that we will need something like EF_BPF_CPUVER
if we are ever gonna support relaxation in any linker external to
libbpf, and also to detect (and error/warn) when several objects with
different BPF versions are linked together.

As far as flag names, let's use EF_ prefix. I think it's more canonical.
And single 0xF is probably enough for cpu ver.

Agreed.