Re: Usage of "p" constraint in BPF inline asm

Eduard Zingerman <eddyz87@xxxxxxxxx> · Thu, 10 Aug 2023 22:01:48 +0300

On Thu, 2023-08-10 at 10:45 -0700, Yonghong Song wrote:
> 
> On 8/10/23 10:39 AM, Jose E. Marchesi wrote:
> > 
> > > On 8/10/23 3:35 AM, Jose E. Marchesi wrote:
> > > > Hello.
> > > > We found that some of the BPF selftests use the "p" constraint in
> > > > inline
> > > > assembly snippets, for input operands for MOV (rN = rM) instructions.
> > > > This is mainly done via the __imm_ptr macro defined in
> > > > tools/testing/selftests/bpf/progs/bpf_misc.h:
> > > >     #define __imm_ptr(name) [name]"p"(&name)
> > > > Example:
> > > >     int consume_first_item_only(void *ctx)
> > > >     {
> > > >           struct bpf_iter_num iter;
> > > >           asm volatile (
> > > >                   /* create iterator */
> > > >                   "r1 = %[iter];"
> > > >                   [...]
> > > >                   :
> > > >                   : __imm_ptr(iter)
> > > >                   : CLOBBERS);
> > > >           [...]
> > > >     }
> > > > Little equivalent reproducer:
> > > >     int bar ()
> > > >     {
> > > >       int jorl;
> > > >       asm volatile ("r1 = %a[jorl]" : : [jorl]"p"(&jorl));
> > > >       return jorl;
> > > >     }
> > > > The "p" constraint is a tricky one.  It is documented in the GCC
> > > > manual
> > > > section "Simple Constraints":
> > > >     An operand that is a valid memory address is allowed.  This is
> > > > for
> > > >     ``load address'' and ``push address'' instructions.
> > > >     p in the constraint must be accompanied by address_operand as the
> > > >     predicate in the match_operand.  This predicate interprets the mode
> > > >     specified in the match_operand as the mode of the memory reference for
> > > >     which the address would be valid.
> > > > There are two problems:
> > > > 1. It is questionable whether that constraint was ever intended to
> > > > be
> > > >      used in inline assembly templates, because its behavior really
> > > >      depends on compiler internals.  A "memory address" is not the same
> > > >      than a "memory operand" or a "memory reference" (constraint "m"), and
> > > >      in fact its usage in the template above results in an error in both
> > > >      x86_64-linux-gnu and bpf-unkonwn-none:
> > > >        foo.c: In function ‘bar’:
> > > >        foo.c:6:3: error: invalid 'asm': invalid expression as operand
> > > >           6 |   asm volatile ("r1 = %[jorl]" : : [jorl]"p"(&jorl));
> > > >             |   ^~~
> > > >      I would assume the same happens with aarch64, riscv, and
> > > > most/all
> > > >      other targets in GCC, that do not accept operands of the form A + B
> > > >      that are not wrapped either in a const or in a memory reference.
> > > >      To avoid that error, the usage of the "p" constraint in internal
> > > > GCC
> > > >      instruction templates is supposed to be complemented by the 'a'
> > > >      modifier, like in:
> > > >        asm volatile ("r1 = %a[jorl]" : : [jorl]"p"(&jorl));
> > > >      Internally documented (in GCC's final.cc) as:
> > > >        %aN means expect operand N to be a memory address
> > > >           (not a memory reference!) and print a reference
> > > >           to that address.
> > > >      That works because when the modifier 'a' is found, GCC prints an
> > > >      "operand address", which is not the same than an "operand".
> > > >      But...
> > > > 2. Even if we used the internal 'a' modifier (we shouldn't) the 'rN
> > > > =
> > > >      rM' instruction really requires a register argument.  In cases
> > > >      involving automatics, like in the examples above, we easily end with:
> > > >        bar:
> > > >           #APP
> > > >               r1 = r10-4
> > > >           #NO_APP
> > > >      In other cases we could conceibly also end with a 64-bit label
> > > > that
> > > >      may overflow the 32-bit immediate operand of `rN = imm32'
> > > >      instructions:
> > > >           r1 = foo
> > > >      All of which is clearly wrong.
> > > > clang happens to do "the right thing" in the current usage of
> > > > __imm_ptr
> > > > in the BPF tests, because even with -O2 it seems to "reload" the
> > > > fp-relative address of the automatic to a register like in:
> > > >     bar:
> > > > 	r1 = r10
> > > > 	r1 += -4
> > > > 	#APP
> > > > 	r1 = r1
> > > > 	#NO_APP
> > > 
> > > Unfortunately, the modifier 'a' won't work for clang.
> > > 
> > > $ cat t.c  int bar ()  {     int jorl;     asm volatile ("r1 =
> > > %a[jorl]" : : [jorl]"p"(&jorl));     return jorl;  }  $ gcc -O2 -g -S
> > > t.c  $ clang --target=bpf -O2 -g -S t.c  clang:
> > > ../lib/Target/BPF/BPFAsmPrinter.cpp:126: virtual bool
> > > {anonymous}::BPFAsmPrinter::PrintAsmMemoryOperand(const
> > > llvm::MachineInstr*, unsigned int, const char*, llvm::raw_ostream&):
> > > Assertion `Offs
> > > etMO.isImm() && "Unexpected offset for inline asm memory operand."' failed.
> > > ...
> > > 
> > > I guess BPF backend can try to add support for this 'a' modifier
> > > if necessary.
> > 
> > I wouldn't advise that: it is an internal GCC detail that just happens
> > to work in inline asm.  Also, even if you did that constraint may result
> > in operands that are not single registers.  It would be better to use
> > "r" constraint instead.
> 
> Sounds good. We also do not want to add support for this 'a' thing
> if there are alternatives.
> 
> > 
> > > 
> > > > Which is what GCC would generate with -O0.  Whether this is by chance or
> > > > by design (Nick, do you know?) I don't think the compiler should be
> > > > expected to do that reload driven by the "p" constraint.
> > > > I would suggest to change that macro (and similar out of macro
> > > > usages of
> > > > the "p" constraint in selftests/bpf/progs/iters.c) to use the "r"
> > > > constraint instead.  If a register is what is required, we should let
> > > > the compiler know.
> > > 
> > > Could you specify what is the syntax ("r" constraint) which will work
> > > for both clang and gcc?
> > 
> > Instead of:
> > 
> >     #define __imm_ptr(name) [name]"p"(&name)
> > 
> > Use this:
> > 
> >     #define __imm_ptr(name) [name]"r"(&name)
> > 
> > That assures that the operand (the pointer value) will be available in
> > the form of a single register.
> 
> Okay, this seems work for both gcc and clang.
> Eduard, what do you think about the above suggested change?

BPF selftests are passing with this change.
The macro in question is used in 3 files:
- verifier_subprog_precision.c
- iters_state_safety.c
- iters_looping.c

I don't see any difference in the generated object files
(at-least for cpuv4).

So, I guess we should be fine.

> 
> > 
> > > 
> > > > Thoughts?
> > > > PS: I am aware that the x86 port of the kernel uses the "p"
> > > > constraint
> > > >       in the percpu macros (arch/x86/include/asm/percpu.h) but that usage
> > > >       is in a different context (I would assume it is used in x86
> > > >       instructions that get constant addresses or global addresses loaded
> > > >       in registers and not automatics) where it seems to work well.
> > > > 
>