Hello. We found that some of the BPF selftests use the "p" constraint in inline assembly snippets, for input operands for MOV (rN = rM) instructions. This is mainly done via the __imm_ptr macro defined in tools/testing/selftests/bpf/progs/bpf_misc.h: #define __imm_ptr(name) [name]"p"(&name) Example: int consume_first_item_only(void *ctx) { struct bpf_iter_num iter; asm volatile ( /* create iterator */ "r1 = %[iter];" [...] : : __imm_ptr(iter) : CLOBBERS); [...] } Little equivalent reproducer: int bar () { int jorl; asm volatile ("r1 = %a[jorl]" : : [jorl]"p"(&jorl)); return jorl; } The "p" constraint is a tricky one. It is documented in the GCC manual section "Simple Constraints": An operand that is a valid memory address is allowed. This is for ``load address'' and ``push address'' instructions. p in the constraint must be accompanied by address_operand as the predicate in the match_operand. This predicate interprets the mode specified in the match_operand as the mode of the memory reference for which the address would be valid. There are two problems: 1. It is questionable whether that constraint was ever intended to be used in inline assembly templates, because its behavior really depends on compiler internals. A "memory address" is not the same than a "memory operand" or a "memory reference" (constraint "m"), and in fact its usage in the template above results in an error in both x86_64-linux-gnu and bpf-unkonwn-none: foo.c: In function ‘bar’: foo.c:6:3: error: invalid 'asm': invalid expression as operand 6 | asm volatile ("r1 = %[jorl]" : : [jorl]"p"(&jorl)); | ^~~ I would assume the same happens with aarch64, riscv, and most/all other targets in GCC, that do not accept operands of the form A + B that are not wrapped either in a const or in a memory reference. To avoid that error, the usage of the "p" constraint in internal GCC instruction templates is supposed to be complemented by the 'a' modifier, like in: asm volatile ("r1 = %a[jorl]" : : [jorl]"p"(&jorl)); Internally documented (in GCC's final.cc) as: %aN means expect operand N to be a memory address (not a memory reference!) and print a reference to that address. That works because when the modifier 'a' is found, GCC prints an "operand address", which is not the same than an "operand". But... 2. Even if we used the internal 'a' modifier (we shouldn't) the 'rN = rM' instruction really requires a register argument. In cases involving automatics, like in the examples above, we easily end with: bar: #APP r1 = r10-4 #NO_APP In other cases we could conceibly also end with a 64-bit label that may overflow the 32-bit immediate operand of `rN = imm32' instructions: r1 = foo All of which is clearly wrong. clang happens to do "the right thing" in the current usage of __imm_ptr in the BPF tests, because even with -O2 it seems to "reload" the fp-relative address of the automatic to a register like in: bar: r1 = r10 r1 += -4 #APP r1 = r1 #NO_APP Which is what GCC would generate with -O0. Whether this is by chance or by design (Nick, do you know?) I don't think the compiler should be expected to do that reload driven by the "p" constraint. I would suggest to change that macro (and similar out of macro usages of the "p" constraint in selftests/bpf/progs/iters.c) to use the "r" constraint instead. If a register is what is required, we should let the compiler know. Thoughts? PS: I am aware that the x86 port of the kernel uses the "p" constraint in the percpu macros (arch/x86/include/asm/percpu.h) but that usage is in a different context (I would assume it is used in x86 instructions that get constant addresses or global addresses loaded in registers and not automatics) where it seems to work well.