On Fri, Mar 29, 2024 at 4:52 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Fri, Mar 29, 2024 at 11:47 AM Andrii Nakryiko <andrii@xxxxxxxxxx> wrote: > > > > Using new per-CPU BPF instruction, partially inline > > bpf_map_lookup_elem() helper for per-CPU hashmap BPF map. Just like for > > normal HASH map, we still generate a call into __htab_map_lookup_elem(), > > but after that we resolve per-CPU element address using a new > > instruction, saving on extra functions calls. > > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > --- > > kernel/bpf/hashtab.c | 21 +++++++++++++++++++++ > > 1 file changed, 21 insertions(+) > > > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > > index e81059faae63..74950f373bab 100644 > > --- a/kernel/bpf/hashtab.c > > +++ b/kernel/bpf/hashtab.c > > @@ -2308,6 +2308,26 @@ static void *htab_percpu_map_lookup_elem(struct bpf_map *map, void *key) > > return NULL; > > } > > > > +/* inline bpf_map_lookup_elem() call for per-CPU hashmap */ > > +static int htab_percpu_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf) > > +{ > > + struct bpf_insn *insn = insn_buf; > > + > > + if (!bpf_jit_supports_percpu_insns()) > > + return -EOPNOTSUPP; > > + > > + BUILD_BUG_ON(!__same_type(&__htab_map_lookup_elem, > > + (void *(*)(struct bpf_map *map, void *key))NULL)); > > + *insn++ = BPF_EMIT_CALL(__htab_map_lookup_elem); > > + *insn++ = BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3); > > + *insn++ = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, > > + offsetof(struct htab_elem, key) + map->key_size); > > + *insn++ = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0); > > here and in the previous patch probably need to gate this by > sizeof(void *) == 8 > Just to prevent future bugs. All the gen_lookup callbacks are called only if `prog->jit_requested && BITS_PER_LONG == 64`, it's checked generically in do_misc_fixups(). And seems like other gen_lookup implementations don't check for sizeof(void *) and assume 64-bits, so I decided to stay consistent (my initial implementation actually worked for both x86 and x86-64, but once I saw the BITS_PER_LONG == 64 I simplified it to assume 8). > > > + *insn++ = BPF_LDX_ADDR_PERCPU(BPF_REG_0, BPF_REG_0, 0); > > Overall it looks great! thanks!