On Wed, Dec 13, 2023 at 06:15:20PM -0800, Alexei Starovoitov wrote: > On Tue, Dec 12, 2023 at 2:28 AM Anton Protopopov <aspsk@xxxxxxxxxxxxx> wrote: > > > > > > This seems to have a benefit that there is no back compatibility issue > > (if we use r1, because r0/r11 will be rejected by old verifiers). We > > can have > > > > r1 = 64bit_const > > if r1 == r1 goto > > > > and > > > > r1 = 64bit_const > > if r1 != r1 goto > > > > and translate it on prog load to new instruction as JUMP_OF_NOP and > > NOP_OR_JUMP, correspondingly. On older kernels it will have the > > default (key is off) behaviour. > > As Andrii pointed out any new insn either JA with extra bits > or special meaning if rX == rX can be sanitized by libbpf > into plain JA. > There will be no backward compat issues. > > > Ok, from BPF arch perspective this can work with two bits (not for > > practical purposes though, IMO, see my next e-mail). > > I read this email and I still don't understand why you need a 3rd bit. > > > > > > And the special map really doesn't fit. > > > Whatever we do, let's keep text_poke-able insn logic separate > > > from bookkeeping of addresses of those insns. > > > I think a special prefixed section that is understood by libbpf > > > (like what I proposed with "name.static_branch") will do fine. > > > If it's not good enough we can add a "set" map type > > > that will be a generic set of values. > > > It can be a set of 8-byte addresses to keep locations of static_branches, > > > but let's keep it generic. > > > I think it's fine to add: > > > __uint(type, BPF_MAP_TYPE_SET) > > > and let libbpf populate it with addresses of insns, > > > or address of variables, or other values > > > when it prepares a program for loading. > > > > What is the higher-level API in this case? The static_branch_set(branch, > > bool on) is not enough because we want to distinguish between "normal" > > and "inverse" branches (for unlikely/likely cases). > > What is "likely/unlikely cases" ? > likely() is a hint to the compiler to order basic blocks in > a certain way. There is no likely/unlikely bit in the binary code > after compilation on x86 or other architectures. > > There used to be a special bit on sparc64 that would mean > a default jmp|fallthrough action for a conditional jmp. > But that was before sparc became out of order and gained proper > branch predictor in HW. Consider this code: int foo(void) { if (static_branch_likely(&key)) return 33; return 55; } int hoo(void) { if (static_branch_unlikely(&key)) return 33; return 55; } When the key is disabled the corresponding code is: 0000000000000010 <foo>: 19: eb 0b jmp 26 <foo+0x16> <-------- likely(static branch), key off 1b: b8 21 00 00 00 mov $0x21,%eax 20: 5d pop %rbp 21: e9 00 00 00 00 jmp 26 <foo+0x16> 26: b8 37 00 00 00 mov $0x37,%eax 2b: 5d pop %rbp 2c: e9 00 00 00 00 jmp 31 <foo+0x21> 31: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 38: 00 00 00 00 3c: 0f 1f 40 00 nopl 0x0(%rax) 0000000000000050 <hoo>: <hoo>: 59: 66 90 xchg %ax,%ax <-------------- unlikely(static branch), key off 5b: b8 37 00 00 00 mov $0x37,%eax 60: 5d pop %rbp 61: e9 00 00 00 00 jmp 66 <hoo+0x16> 66: b8 21 00 00 00 mov $0x21,%eax 6b: 5d pop %rbp 6c: e9 00 00 00 00 jmp 71 <__UNIQUE_ID_vermagic248+0x5> When the key is enabled, the code is: 0000000000000010 <foo>: 19: 66 90 xchg %ax,%ax <--------------- likely(static branch), key on 1b: b8 21 00 00 00 mov $0x21,%eax 20: 5d pop %rbp 21: e9 00 00 00 00 jmp 26 <foo+0x16> 26: b8 37 00 00 00 mov $0x37,%eax 2b: 5d pop %rbp 2c: e9 00 00 00 00 jmp 31 <foo+0x21> 31: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) 38: 00 00 00 00 3c: 0f 1f 40 00 nopl 0x0(%rax) 0000000000000050 <hoo>: 59: eb 0b jmp 66 <hoo+0x16> <--------- unlikely(static branch), key on 5b: b8 37 00 00 00 mov $0x37,%eax 60: 5d pop %rbp 61: e9 00 00 00 00 jmp 66 <hoo+0x16> 66: b8 21 00 00 00 mov $0x21,%eax 6b: 5d pop %rbp 6c: e9 00 00 00 00 jmp 71 <__UNIQUE_ID_vermagic248+0x5> So, for the likely case we set branch to JUMP/NOP when key is OFF/ON. And for the unlikely case we set branch to NOP/JUMP when key is OFF/ON. The kernel keeps this information, and when static_key_enable(key) is executed, it does [simplified for reading] for (entry = key->start; entry < stop; entry++) arch_jump_label_transform(entry, jump_label_type(entry)); this jump_label_type() contains this bit of information: are we writing NOP or JUMP for an enabled key. Same for static_key_disable(key). Now, for BPF we do static_branch_enable(branch). To generate proper JITed code, we have enough of information (NOP=0, JUMP=2): static_branch_enable(JA[SRC=1|NOP]) jits to ARCH_JUMP static_branch_enable(JA[SRC=1|JUMP]) jits to ARCH_NOP static_branch_disable(JA[SRC=1|NOP]) jits to ARCH_NOP static_branch_disable(JA[SRC=1|JUMP]) jits to ARCH_JUMP But how do we represent this in xlated code to user? Do we patch the xlated code? If we do, then static_branch_enable changes JA[SRC=1|NOP] to JA[SRC=1|JUMP], ARCH_JUMP generated static_branch_disable sees JA[SRC=1|JUMP], changes it to JA[SRC=1|NOP], but ARCH_JUMP is generated or what about two static_branch_enable on the same branch? By flipping I meant that we always do JA[SRC=1|NOP] jits to ARCH_NOP JA[SRC=1|JUMP] jits to ARCH_JUMP the primitive operation is static_branch_flip which changes JA[SRC=1|NOP] to JA[SRC=1|JUMP] and vice versa. Then for static_key_enable(key) we flip all the branches if key was disabled and do nothing otherwise. Same for static_key_enable. What you've proposed before is to keep this "third bit" in xlated+jitted form. Basically, we check the state of the branch "on/off" like this: say, we see that xlated/jitted state of a branch is JA[SRC=1|NOP] and ARCH_JUMP, then we can say that this branch is on. How do we report it to user in PROG_INFO to we set the instruction to JA[SRC=1|JUMP] in output, specifying that its current stae is to jump? This would work, I think. > > We can implement > > this using something like this: > > > > static_key_set(key, bool new_value) > > { > > /* true if we change key value */ > > bool key_changed = key->old_value ^ new_value; > > > > for_each_prog(prog, key) > > for_each_branch(branch, prog, key) > > static_branch_flip(prog, branch, key_changed) > > } > > > > where static_branch_flip flips the second bit of SRC_REG. > > I don't understand why you keep bringing up 'flip' use case. > The kernel doesn't have such an operation on static branches. > Which makes me believe that it wasn't necessary. > Why do we need one for the bpf static branch? See above. > > We need to > > keep track of prog->branches and key->progs. How is this different > > from what my patch implements? > > What I'm proposing is to have a generic map __uint(type, BPF_MAP_TYPE_SET) > and by naming convention libbpf will populate it with addresses > of JA_OR_NOP from all progs. > In asm it could be: > asm volatile ("r0 = %[set_A] ll; goto_or_nop ..."); > (and libbpf will remove ld_imm64 from the prog before loading.) > > or via > asm volatile ("goto_or_nop ...; .pushsection set_A_name+suffix; .long"); > (and libbpf will copy from the special section into a set and remove > special section). > > It will be a libbpf convention and the kernel doesn't need > to know about a special static branch map type or array of addresses > in prog_load cmd. > Only JA insn is relevant to the verifier and JITs. > > Ideally we don't need to introduce SET map type and > libbpf wouldn't need to populate it. > If we can make it work with an array of values that .pushsection + .long > automatically populates and libbpf treats it as a normal global data array > that would be ideal. > Insn addresses from all progs will be in that array after loading. > Sort of like ".kconfig" section that libbpf populates, > but it's a normal array underneath. This doesn't work without the kernel side, as we change instructions offsets inside the verifier when, e.g., converting helper calls to individual instructions, etc. Otherwise, this is more-or-less what my patch does: get a list of offsets of banches inside the program, associate them with a map (or ID of any kind), and then push to kernel. And if you need to keep this list in kernel, then it looks like we can keep it and access per key, not in a loop... Alternative is to encode key ID in the instruction in some form so that it is visible in xlated code after the program is loaded. But in this case this makes users responsible to writing iterators and bookkeeping all relationships vs. just loading a program with a map fd, which is a well-established api in BPF. > > If this is implemented in userspace, then how we prevent synchronous > > updates of the key (and a relocation variant doesn't seem to work from > > userspace)? Or is this a new kfunc? If yes, then how is it > > executed, > > then user space can have small helper in libbpf that iterates > over SET (or array) and > calls sys_bpf(cmd=STATIC_BRANCH_ENABLE, one_value_from_set) > > Similar in the kernel. When bpf progs want to enable a key it does > bpf_for_each(set) { // open coded iterator > bpf_static_branch_enable(addr); // kfunc call > } Is this possible to use map like it is used in my patch, but do sys_bpf(cmd=STATIC_KEY_ENABLE, attr.map_fd) so that it specifically separated from map_update_value? (If it was not a map, but just some "bpf object"?)