> On Thu, 2023-08-10 at 10:45 -0700, Yonghong Song wrote: >> >> On 8/10/23 10:39 AM, Jose E. Marchesi wrote: >> > >> > > On 8/10/23 3:35 AM, Jose E. Marchesi wrote: >> > > > Hello. >> > > > We found that some of the BPF selftests use the "p" constraint in >> > > > inline >> > > > assembly snippets, for input operands for MOV (rN = rM) instructions. >> > > > This is mainly done via the __imm_ptr macro defined in >> > > > tools/testing/selftests/bpf/progs/bpf_misc.h: >> > > > #define __imm_ptr(name) [name]"p"(&name) >> > > > Example: >> > > > int consume_first_item_only(void *ctx) >> > > > { >> > > > struct bpf_iter_num iter; >> > > > asm volatile ( >> > > > /* create iterator */ >> > > > "r1 = %[iter];" >> > > > [...] >> > > > : >> > > > : __imm_ptr(iter) >> > > > : CLOBBERS); >> > > > [...] >> > > > } >> > > > Little equivalent reproducer: >> > > > int bar () >> > > > { >> > > > int jorl; >> > > > asm volatile ("r1 = %a[jorl]" : : [jorl]"p"(&jorl)); >> > > > return jorl; >> > > > } >> > > > The "p" constraint is a tricky one. It is documented in the GCC >> > > > manual >> > > > section "Simple Constraints": >> > > > An operand that is a valid memory address is allowed. This is >> > > > for >> > > > ``load address'' and ``push address'' instructions. >> > > > p in the constraint must be accompanied by address_operand as the >> > > > predicate in the match_operand. This predicate interprets the mode >> > > > specified in the match_operand as the mode of the memory reference for >> > > > which the address would be valid. >> > > > There are two problems: >> > > > 1. It is questionable whether that constraint was ever intended to >> > > > be >> > > > used in inline assembly templates, because its behavior really >> > > > depends on compiler internals. A "memory address" is not the same >> > > > than a "memory operand" or a "memory reference" (constraint "m"), and >> > > > in fact its usage in the template above results in an error in both >> > > > x86_64-linux-gnu and bpf-unkonwn-none: >> > > > foo.c: In function ‘bar’: >> > > > foo.c:6:3: error: invalid 'asm': invalid expression as operand >> > > > 6 | asm volatile ("r1 = %[jorl]" : : [jorl]"p"(&jorl)); >> > > > | ^~~ >> > > > I would assume the same happens with aarch64, riscv, and >> > > > most/all >> > > > other targets in GCC, that do not accept operands of the form A + B >> > > > that are not wrapped either in a const or in a memory reference. >> > > > To avoid that error, the usage of the "p" constraint in internal >> > > > GCC >> > > > instruction templates is supposed to be complemented by the 'a' >> > > > modifier, like in: >> > > > asm volatile ("r1 = %a[jorl]" : : [jorl]"p"(&jorl)); >> > > > Internally documented (in GCC's final.cc) as: >> > > > %aN means expect operand N to be a memory address >> > > > (not a memory reference!) and print a reference >> > > > to that address. >> > > > That works because when the modifier 'a' is found, GCC prints an >> > > > "operand address", which is not the same than an "operand". >> > > > But... >> > > > 2. Even if we used the internal 'a' modifier (we shouldn't) the 'rN >> > > > = >> > > > rM' instruction really requires a register argument. In cases >> > > > involving automatics, like in the examples above, we easily end with: >> > > > bar: >> > > > #APP >> > > > r1 = r10-4 >> > > > #NO_APP >> > > > In other cases we could conceibly also end with a 64-bit label >> > > > that >> > > > may overflow the 32-bit immediate operand of `rN = imm32' >> > > > instructions: >> > > > r1 = foo >> > > > All of which is clearly wrong. >> > > > clang happens to do "the right thing" in the current usage of >> > > > __imm_ptr >> > > > in the BPF tests, because even with -O2 it seems to "reload" the >> > > > fp-relative address of the automatic to a register like in: >> > > > bar: >> > > > r1 = r10 >> > > > r1 += -4 >> > > > #APP >> > > > r1 = r1 >> > > > #NO_APP >> > > >> > > Unfortunately, the modifier 'a' won't work for clang. >> > > >> > > $ cat t.c int bar () { int jorl; asm volatile ("r1 = >> > > %a[jorl]" : : [jorl]"p"(&jorl)); return jorl; } $ gcc -O2 -g -S >> > > t.c $ clang --target=bpf -O2 -g -S t.c clang: >> > > ../lib/Target/BPF/BPFAsmPrinter.cpp:126: virtual bool >> > > {anonymous}::BPFAsmPrinter::PrintAsmMemoryOperand(const >> > > llvm::MachineInstr*, unsigned int, const char*, llvm::raw_ostream&): >> > > Assertion `Offs >> > > etMO.isImm() && "Unexpected offset for inline asm memory operand."' failed. >> > > ... >> > > >> > > I guess BPF backend can try to add support for this 'a' modifier >> > > if necessary. >> > >> > I wouldn't advise that: it is an internal GCC detail that just happens >> > to work in inline asm. Also, even if you did that constraint may result >> > in operands that are not single registers. It would be better to use >> > "r" constraint instead. >> >> Sounds good. We also do not want to add support for this 'a' thing >> if there are alternatives. >> >> > >> > > >> > > > Which is what GCC would generate with -O0. Whether this is by chance or >> > > > by design (Nick, do you know?) I don't think the compiler should be >> > > > expected to do that reload driven by the "p" constraint. >> > > > I would suggest to change that macro (and similar out of macro >> > > > usages of >> > > > the "p" constraint in selftests/bpf/progs/iters.c) to use the "r" >> > > > constraint instead. If a register is what is required, we should let >> > > > the compiler know. >> > > >> > > Could you specify what is the syntax ("r" constraint) which will work >> > > for both clang and gcc? >> > >> > Instead of: >> > >> > #define __imm_ptr(name) [name]"p"(&name) >> > >> > Use this: >> > >> > #define __imm_ptr(name) [name]"r"(&name) >> > >> > That assures that the operand (the pointer value) will be available in >> > the form of a single register. >> >> Okay, this seems work for both gcc and clang. >> Eduard, what do you think about the above suggested change? > > BPF selftests are passing with this change. > The macro in question is used in 3 files: > - verifier_subprog_precision.c > - iters_state_safety.c > - iters_looping.c > > I don't see any difference in the generated object files > (at-least for cpuv4). > > So, I guess we should be fine. Note the same fix would be needed in the inline asm in selftests/bpf/progs/iters.c:iter_err_unsafe_asm_loop. > >> >> > >> > > >> > > > Thoughts? >> > > > PS: I am aware that the x86 port of the kernel uses the "p" >> > > > constraint >> > > > in the percpu macros (arch/x86/include/asm/percpu.h) but that usage >> > > > is in a different context (I would assume it is used in x86 >> > > > instructions that get constant addresses or global addresses loaded >> > > > in registers and not automatics) where it seems to work well. >> > > > >>