Re: [PATCH bpf-next 6/7] libbpf: BPF Static Keys support

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 11 Dec 2023 10:49:45 -0800

On Sun, Dec 10, 2023 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Sun, Dec 10, 2023 at 2:30 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> >
> > How about a slightly different modification of the Anton's idea.
> > Suppose that, as before, there is a special map type:
> >
> >     struct {
> >         __uint(type, BPF_MAP_TYPE_ARRAY);
> >         __type(key, __u32);
> >         __type(value, __u32);
> >         __uint(map_flags, BPF_F_STATIC_KEY);
> >         __uint(max_entries, 1);
> >     } skey1 SEC(".maps")
>
> Instead of special map that the kernel has to know about
> the same intent can be expressed with:
> int skey1;
> r0 = %[skey1] ll;
> and then the kernel needs no extra map type while the user space
> can collect all static_branches that use &skey1 by
> iterating insn stream and comparing addresses.
>
> > Which is used as below:
> >
> >     __attribute__((naked))
> >     int foo(void) {
> >       asm volatile (
> >                     "r0 = %[skey1] ll;"
> >                     "if r0 != r0 goto 1f;"
> >                     "r1 = r10;"
> >                     "r1 += -8;"
> >                     "r2 = 1;"
> >                     "call %[bpf_trace_printk];"
> >             "1:"
> >                     "exit;"
> >                     :: __imm_addr(skey1),
> >                        __imm(bpf_trace_printk)
> >                     : __clobber_all
> >       );
> >     }
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <foo>:
> >        0:   r0 = 0x0 ll
> >         0000000000000000:  R_BPF_64_64  skey1  ;; <---- Map relocation as usual
> >        2:   if r0 == r0 goto +0x4 <foo+0x38>   ;; <---- Note condition
> >        3:   r1 = r10
> >        4:   r1 += -0x8
> >        5:   r2 = 0x1
> >        6:   call 0x6
> >        7:   exit
> >
> > And suppose that verifier is modified in the following ways:
> > - treat instructions "if rX == rX" / "if rX != rX" (when rX points to
> >   static key map) in a special way:
> >   - when program is verified, the jump is considered non deterministic;
> >   - when program is jitted, the jump is compiled as nop for "!=" and as
> >     unconditional jump for "==";
> > - build a table of static keys based on a specific map referenced in
> >   condition, e.g. for the example above it can be inferred that insn 2
> >   associates with map skey1 because "r0" points to "skey1";
> > - jit "rX = <static key> ll;" as nop;
> >
> > On the plus side:
> > - any kinds of jump tables are omitted from system call;
> > - no new instruction is needed;
> > - almost no modifications to libbpf are necessary (only a helper macro
> >   to convince clang to keep "if rX == rX");
>
> Reusing existing insn means that we're giving it new meaning
> and that always comes with danger of breaking existing progs.
> In this case if rX == rX isn't very meaningful and new semantics
> shouldn't break anything, but it's a danger zone.
>
> If we treat:
> if r0 == r0
> as JA
> then we have to treat
> if r1 == r1
> as JA as well and it becomes ambiguous when prog_info needs
> to return the insns back to user space.
>
> If we go with rX == rX approach we should probably limit it
> to one specific register. r0, r10, r11 can be considered
> and they have their own pros and cons.
>
> Additional:
> r0 = %[skey1] ll
> in front of JE/JNE is a waste. If we JIT it to useless native insn
> we will be burning cpu for no reason. So we should probably
> optimize it out. If we do so, then this inline insn becomes a nop and
> it's effectively a relocation. The insn stream will carry this
> rX = 64bit_const insn to indicate the scope of the next insn.
> It's pretty much like Anton's idea of using extra bits in JA
> to encode an integer key_id.
> With ld_imm64 we will encode 64-bit key_id.
> Another insn with more bits to burn that has no effect on execution.
>
> It doesn't look clean to encode so much extra metadata into instructions
> that JITs and the interpreter have to ignore.
> If we go this route:
>   r11 = 64bit_const
>   if r11 == r11 goto
> is a lesser evil.
> Still, it's not as clean as JA with extra bits in src_reg.
> We already optimize JA +0 into a nop. See opt_remove_nops().
> So a flavor of JA insn looks the most natural fit for a selectable
> JA +xx or JA +0.

Another point for special jump instruction is that libbpf can be
taught to patch it into a normal JA or NOP on old kernels, depending
on likely/unlikely bit. You won't be able to flip it, of course, but
at least you don't have to compile two separate versions of the BPF
object file. Instead of "jump maybe" it will transparently become
"jump/don't jump for sure". This should definitely help with backwards
compatibility.

>
> And the special map really doesn't fit.
> Whatever we do, let's keep text_poke-able insn logic separate
> from bookkeeping of addresses of those insns.
> I think a special prefixed section that is understood by libbpf
> (like what I proposed with "name.static_branch") will do fine.
> If it's not good enough we can add a "set" map type
> that will be a generic set of values.
> It can be a set of 8-byte addresses to keep locations of static_branches,
> but let's keep it generic.
> I think it's fine to add:
> __uint(type, BPF_MAP_TYPE_SET)
> and let libbpf populate it with addresses of insns,
> or address of variables, or other values
> when it prepares a program for loading.
> But map_update_elem should never be doing text_poke on insns.
> We added prog_array map type is the past, but that was done
> during the early days. If we were designing bpf today we would have
> gone a different route.