Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support

Anton Protopopov <aspsk@xxxxxxxxxxxxx> · Thu, 14 Dec 2023 16:33:15 +0000

On Wed, Dec 13, 2023 at 06:15:20PM -0800, Alexei Starovoitov wrote:
> On Tue, Dec 12, 2023 at 2:28 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote:
> >
> >
> > This seems to have a benefit that there is no back compatibility issue
> > (if we use r1, because r0/r11 will be rejected by old verifiers). We
> > can have
> >
> >     r1 = 64bit_const
> >     if r1 == r1 goto
> >
> > and
> >
> >     r1 = 64bit_const
> >     if r1 != r1 goto
> >
> > and translate it on prog load to new instruction as JUMP_OF_NOP and
> > NOP_OR_JUMP, correspondingly. On older kernels it will have the
> > default (key is off) behaviour.
> 
> As Andrii pointed out any new insn either JA with extra bits
> or special meaning if rX == rX can be sanitized by libbpf
> into plain JA.
> There will be no backward compat issues.
> 
> > Ok, from BPF arch perspective this can work with two bits (not for
> > practical purposes though, IMO, see my next e-mail).
> 
> I read this email and I still don't understand why you need a 3rd bit.
> 
> >
> > > And the special map really doesn't fit.
> > > Whatever we do, let's keep text_poke-able insn logic separate
> > > from bookkeeping of addresses of those insns.
> > > I think a special prefixed section that is understood by libbpf
> > > (like what I proposed with "name.static_branch") will do fine.
> > > If it's not good enough we can add a "set" map type
> > > that will be a generic set of values.
> > > It can be a set of 8-byte addresses to keep locations of static_branches,
> > > but let's keep it generic.
> > > I think it's fine to add:
> > > __uint(type, BPF_MAP_TYPE_SET)
> > > and let libbpf populate it with addresses of insns,
> > > or address of variables, or other values
> > > when it prepares a program for loading.
> >
> > What is the higher-level API in this case? The static_branch_set(branch,
> > bool on) is not enough because we want to distinguish between "normal"
> > and "inverse" branches (for unlikely/likely cases).
> 
> What is "likely/unlikely cases" ?
> likely() is a hint to the compiler to order basic blocks in
> a certain way. There is no likely/unlikely bit in the binary code
> after compilation on x86 or other architectures.
> 
> There used to be a special bit on sparc64 that would mean
> a default jmp|fallthrough action for a conditional jmp.
> But that was before sparc became out of order and gained proper
> branch predictor in HW.

Consider this code:

    int foo(void)
    {
    	if (static_branch_likely(&key))
    		return 33;
    	return 55;
    }

    int hoo(void)
    {
    	if (static_branch_unlikely(&key))
    		return 33;
    	return 55;
    }

When the key is disabled the corresponding code is:

    0000000000000010 <foo>:                                                            
      19:   eb 0b                   jmp    26 <foo+0x16> <-------- likely(static branch), key off
      1b:   b8 21 00 00 00          mov    $0x21,%eax                                  
      20:   5d                      pop    %rbp                                        
      21:   e9 00 00 00 00          jmp    26 <foo+0x16>                               
      26:   b8 37 00 00 00          mov    $0x37,%eax                                  
      2b:   5d                      pop    %rbp                                        
      2c:   e9 00 00 00 00          jmp    31 <foo+0x21>                               
      31:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)                    
      38:   00 00 00 00                                                                
      3c:   0f 1f 40 00             nopl   0x0(%rax)                                   

    0000000000000050 <hoo>:                                                            
    <hoo>:                                                                             
      59:   66 90                   xchg   %ax,%ax <-------------- unlikely(static branch), key off                
      5b:   b8 37 00 00 00          mov    $0x37,%eax                                  
      60:   5d                      pop    %rbp                                        
      61:   e9 00 00 00 00          jmp    66 <hoo+0x16>                               
      66:   b8 21 00 00 00          mov    $0x21,%eax                                  
      6b:   5d                      pop    %rbp                                        
      6c:   e9 00 00 00 00          jmp    71 <__UNIQUE_ID_vermagic248+0x5>            

When the key is enabled, the code is:

    0000000000000010 <foo>:
     19:   66 90                   xchg   %ax,%ax <--------------- likely(static branch), key on
     1b:   b8 21 00 00 00          mov    $0x21,%eax
     20:   5d                      pop    %rbp
     21:   e9 00 00 00 00          jmp    26 <foo+0x16>
     26:   b8 37 00 00 00          mov    $0x37,%eax
     2b:   5d                      pop    %rbp
     2c:   e9 00 00 00 00          jmp    31 <foo+0x21>
     31:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
     38:   00 00 00 00
     3c:   0f 1f 40 00             nopl   0x0(%rax)

    0000000000000050 <hoo>:
     59:   eb 0b                   jmp    66 <hoo+0x16> <--------- unlikely(static branch), key on
     5b:   b8 37 00 00 00          mov    $0x37,%eax
     60:   5d                      pop    %rbp
     61:   e9 00 00 00 00          jmp    66 <hoo+0x16>
     66:   b8 21 00 00 00          mov    $0x21,%eax
     6b:   5d                      pop    %rbp
     6c:   e9 00 00 00 00          jmp    71 <__UNIQUE_ID_vermagic248+0x5>

So, for the likely case we set branch to JUMP/NOP when key is OFF/ON.
And for the unlikely case we set branch to NOP/JUMP when key is
OFF/ON. The kernel keeps this information, and when
static_key_enable(key) is executed, it does [simplified for reading]

	for (entry = key->start; entry < stop; entry++)
	        arch_jump_label_transform(entry, jump_label_type(entry));

this jump_label_type() contains this bit of information: are we
writing NOP or JUMP for an enabled key. Same for
static_key_disable(key).

Now, for BPF we do static_branch_enable(branch). To generate proper
JITed code, we have enough of information (NOP=0, JUMP=2):

    static_branch_enable(JA[SRC=1|NOP]) jits to ARCH_JUMP
    static_branch_enable(JA[SRC=1|JUMP]) jits to ARCH_NOP
    static_branch_disable(JA[SRC=1|NOP]) jits to ARCH_NOP
    static_branch_disable(JA[SRC=1|JUMP]) jits to ARCH_JUMP

But how do we represent this in xlated code to user? Do we patch the
xlated code? If we do, then

    static_branch_enable changes JA[SRC=1|NOP] to JA[SRC=1|JUMP], ARCH_JUMP generated
    static_branch_disable sees JA[SRC=1|JUMP], changes it to JA[SRC=1|NOP], but ARCH_JUMP is generated

or what about two static_branch_enable on the same branch? By flipping
I meant that we always do

    JA[SRC=1|NOP]  jits to ARCH_NOP
    JA[SRC=1|JUMP] jits to ARCH_JUMP

the primitive operation is static_branch_flip which changes
JA[SRC=1|NOP] to JA[SRC=1|JUMP] and vice versa. Then for
static_key_enable(key) we flip all the branches if key was disabled
and do nothing otherwise. Same for static_key_enable.

What you've proposed before is to keep this "third bit" in
xlated+jitted form.  Basically, we check the state of the branch
"on/off" like this: say, we see that xlated/jitted state of a branch
is JA[SRC=1|NOP] and ARCH_JUMP, then we can say that this branch is
on. How do we report it to user in PROG_INFO to we set the instruction
to JA[SRC=1|JUMP] in output, specifying that its current stae is to
jump? This would work, I think.

> >  We can implement
> > this using something like this:
> >
> > static_key_set(key, bool new_value)
> > {
> >     /* true if we change key value */
> >     bool key_changed = key->old_value ^ new_value;
> >
> >     for_each_prog(prog, key)
> >         for_each_branch(branch, prog, key)
> >             static_branch_flip(prog, branch, key_changed)
> > }
> >
> > where static_branch_flip flips the second bit of SRC_REG.
> 
> I don't understand why you keep bringing up 'flip' use case.
> The kernel doesn't have such an operation on static branches.
> Which makes me believe that it wasn't necessary.
> Why do we need one for the bpf static branch?

See above.

> > We need to
> > keep track of prog->branches and key->progs. How is this different
> > from what my patch implements?
> 
> What I'm proposing is to have a generic map __uint(type, BPF_MAP_TYPE_SET)
> and by naming convention libbpf will populate it with addresses
> of JA_OR_NOP from all progs.
> In asm it could be:
> asm volatile ("r0 = %[set_A] ll; goto_or_nop ...");
> (and libbpf will remove ld_imm64 from the prog before loading.)
> 
> or via
> asm volatile ("goto_or_nop ...; .pushsection set_A_name+suffix; .long");
> (and libbpf will copy from the special section into a set and remove
> special section).
> 
> It will be a libbpf convention and the kernel doesn't need
> to know about a special static branch map type or array of addresses
> in prog_load cmd.
> Only JA insn is relevant to the verifier and JITs.
> 
> Ideally we don't need to introduce SET map type and
> libbpf wouldn't need to populate it.
> If we can make it work with an array of values that .pushsection + .long
> automatically populates and libbpf treats it as a normal global data array
> that would be ideal.
> Insn addresses from all progs will be in that array after loading.
> Sort of like ".kconfig" section that libbpf populates,
> but it's a normal array underneath.

This doesn't work without the kernel side, as we change instructions
offsets inside the verifier when, e.g., converting helper calls to
individual instructions, etc.

Otherwise, this is more-or-less what my patch does: get a list of
offsets of banches inside the program, associate them with a map (or
ID of any kind), and then push to kernel.

And if you need to keep this list in kernel, then it looks like we can
keep it and access per key, not in a loop...

Alternative is to encode key ID in the instruction in some form so
that it is visible in xlated code after the program is loaded. But in
this case this makes users responsible to writing iterators and
bookkeeping all relationships vs. just loading a program with a map
fd, which is a well-established api in BPF.

> > If this is implemented in userspace, then how we prevent synchronous
> > updates of the key (and a relocation variant doesn't seem to work from
> > userspace)? Or is this a new kfunc? If yes, then how is it
> > executed,
> 
> then user space can have small helper in libbpf that iterates
> over SET (or array) and
> calls sys_bpf(cmd=STATIC_BRANCH_ENABLE, one_value_from_set)
>
> Similar in the kernel. When bpf progs want to enable a key it does
> bpf_for_each(set) { // open coded iterator
>    bpf_static_branch_enable(addr); // kfunc call
> }

Is this possible to use map like it is used in my patch, but do
sys_bpf(cmd=STATIC_KEY_ENABLE, attr.map_fd) so that it specifically
separated from map_update_value? (If it was not a map, but
just some "bpf object"?)