> On Sat, Jul 29, 2023 at 1:29 AM Jose E. Marchesi > <jose.marchesi@xxxxxxxxxx> wrote: >> >> >> > On Fri, Jul 28, 2023 at 11:01 AM Jose E. Marchesi >> > <jose.marchesi@xxxxxxxxxx> wrote: >> >> >> >> >> >> >> On 7/28/23 9:41 AM, Jose E. Marchesi wrote: >> >> >>> Hello. >> >> >>> Just a heads up regarding the new BPF V4 instructions and their >> >> >>> support >> >> >>> in the GNU Toolchain. >> >> >>> V4 sdiv/smod instructions >> >> >>> Binutils has been updated to use the V4 encoding of these >> >> >>> instructions, which used to be part of the xbpf testing dialect used >> >> >>> in GCC. GCC generates these instructions for signed division when >> >> >>> -mcpu=v4 or higher. >> >> >>> V4 sign-extending register move instructions >> >> >>> V4 signed load instructions >> >> >>> V4 byte swap instructions >> >> >>> Supported in assembler, disassembler and linker. GCC generates >> >> >>> these >> >> >>> instructions when -mcpu=v4 or higher. >> >> >>> V4 32-bit unconditional jump instruction >> >> >>> Supported in assembler and disassembler. GCC doesn't generate >> >> >>> that >> >> >>> instruction. >> >> >>> However, the assembler has been expanded in order to perform the >> >> >>> following relaxations when the disp16 field of a jump instruction is >> >> >>> known at assembly time, and is overflown, unless -mno-relax is >> >> >>> specified: >> >> >>> JA disp16 -> JAL disp32 >> >> >>> Jxx disp16 -> Jxx +1; JA +1; JAL disp32 >> >> >>> Where Jxx is one of the conditional jump instructions such as >> >> >>> jeq, >> >> >>> jlt, etc. >> >> >> >> >> >> Sounds great. The above 'JA/Jxx disp16' transformation matches >> >> >> what llvm did as well. >> >> > >> >> > Not by chance ;) >> >> > >> >> > Now what is pending in binutils is to relax these jumps in the linker as >> >> > well. But it is very low priority, compared to get these kernel >> >> > selftests building and running. So it will happen, but probably not >> >> > anytime soon. >> >> >> >> By the way, for doing things like that (further object transformations >> >> by linkers and the like) we will need to have the ELF files annotated >> >> with: >> >> >> >> - The BPF cpu version the object was compiled for: v1, v2, v3, v4, and >> >> >> >> - Individual flags specifying the BPF cpu capabilities (alu32, bswap, >> >> jmp32, etc) required/expected by the code in the object. >> >> >> >> Note it is interesting to being able to denote both, for flexibility. >> >> >> >> There are 32 bits available for machine-specific flags in e_flags, which >> >> are commonly used for this purpose by other arches. For BPF I would >> >> suggest something like: >> >> >> >> #define EF_BPF_ALU32 0x00000001 >> >> #define EF_BPF_JMP32 0x00000002 >> >> #define EF_BPF_BSWAP 0x00000004 >> >> #define EF_BPF_SDIV 0x00000008 >> >> #define EF_BPF_CPUVER 0x00FF0000 >> > >> > Interesting idea. I don't mind, but what are we going to do with this info? >> > I cannot think of anything useful libbpf could do with it. >> > For other archs such flags make sense, since disasm of everything >> > to discover properties is hard. For BPF we will parse all insns anyway, >> > so additional info in ELF doesn't give any additional insight. >> >> I mainly had link-time relaxation in mind. The linker needs to know >> what instructions are available (JMP32 or not) in order to decide what >> to relax, and to what. > > But the assembler has little choice when the jump target is >16bits. > It can use jmp32 or error. When the assembler sees a jump instruction: goto EXPR there are several possibilities: 1. EXPR consists on a literal number like 1, -10 or 0xff, or an expression that can be resolved during the first assembler pass (like 8 * 64). The numerical result is interpreted as number of 64-bit words minus one. In this case, the assembler can immediately decide whether the operand is >16 bits, relaxing to the jmp32 jump if cpu >= v4 and unless -mno-relax is passed in the command line. 2. EXPR is a symbolic expression involving a symbol that can be resolved during the second assembler pass. For example, `foo + 10'. In this case, there are two possibilities: 2.1. The symbol is an absolute symbol. In this case the value is interpreted as-such and no conversion is done by the assembler. So if for example the user invokes the assembler passing `--defsym foo=10', the assembled instruction is `ja 20'. 2.2. The symbol is a PC-relative or section-relative symbol. In this case the value is interpreted as a byte offset (the assembler takes care to transform offsets relative to the current section into PC-relative offsets whenever necessary). This is the case of labels. For these symbols, the BPF assembler converts the value from bytes to number of 64-bit words minus one. So for example for `ja done' where `done' has the value 256 bytes, the assembled instruction is `ja 31'. 3. EXPR is a symbolic expression involving a symbol that cannot be resolved during the second assembler pass. In this case, a relocation for the 16-bit immediate field in the instruction is generated in the assembled object. There is no R_BPF_64_16 relocation defined by BPF as of yet, so we are using R_BPF_GNU_64_16=256, which as we agreed uses a high relocation number to avoid collisions. Since gas is a standalone assembler, it seems sensible to emit a relocation rather than erroing out in these situations. ld knows how to handle these relocs when linking BPF objects together. > I guess you're proposing to encode this e_flags in the text of asm ? > Special asm directive that will force asm to error or use jmp32? GAS uses command-line options for that. When GCC is invoked with -mcpu=v3, for example, it passes the corresponding option to the assembler so it expects a BPF V3 assembly program. In that scenario, if the user does a jump to an address that is >16bit in an inline asm, the assembler will error out, because relaxing to jmp32 is not a possibility in V3. Ditto for compiler options like -msdiv or -mjmp32, that both clang and GCC support. I don't know how clang configures its integrated assembler... I guess by calling some function. But it is the same principle: if you tell clang to generate v3 bpf and you include a header that uses a v4 instruction (or overflown jump that would require relaxation) in inline asm, you want an error. >> Also as you mention the disassembler can look in the object to determine >> which instructions shall be recognized and with insructions shall be >> reported as <unknown>. Right now it is necessary to pass an explicit >> option to the assembler, and the default is v4. > > Disambiguating between unknown and exact insn kinda makes sense for disasm. > For assembler it's kinda weird. If text says 'sdiv' the asm should emit > binary code for it regardless of asm directive. Unless configured to not do so? See above. > It seems e_flags can only be emitted by assembler. > Like if it needs to use jmp32 it will add EF_BPF_JMP32. Yep. > Still feels that we can live without these flags, but not a bad > addition. The individual flags... I am not sure, other arches have them, but maybe having them in BPF doesn't make much sense and it is not worth the extra complication and wasted bits in e_flags. How realistic is to expect that some kernel may support a particular version of the BPF ISA, and also have support for some particular instruction from a later ISA as the result of a backport or something? Not for me to judge... I was already bitten by my utter ignorance on kernel business when I added that silly useless -mkernel=VERSION option to GCC 8-) What I am pretty sure is that we will need something like EF_BPF_CPUVER if we are ever gonna support relaxation in any linker external to libbpf, and also to detect (and error/warn) when several objects with different BPF versions are linked together. > As far as flag names, let's use EF_ prefix. I think it's more canonical. > And single 0xF is probably enough for cpu ver. Agreed.