On Tue, May 24, 2022 at 12:53 AM Feng zhou <zhoufeng.zf@xxxxxxxxxxxxx> wrote: > +static void setup(void) > +{ > + struct bpf_link *link; > + int map_fd, i, max_entries; > + > + setup_libbpf(); > + > + ctx.skel = bpf_map_bench__open_and_load(); > + if (!ctx.skel) { > + fprintf(stderr, "failed to open skeleton\n"); > + exit(1); > + } > + > + link = bpf_program__attach(ctx.skel->progs.benchmark); > + if (!link) { > + fprintf(stderr, "failed to attach program!\n"); > + exit(1); > + } > + > + //fill hash_map > + map_fd = bpf_map__fd(ctx.skel->maps.hash_map_bench); > + max_entries = bpf_map__max_entries(ctx.skel->maps.hash_map_bench); > + for (i = 0; i < max_entries; i++) > + bpf_map_update_elem(map_fd, &i, &i, BPF_ANY); > +} ... +SEC("fentry/" SYS_PREFIX "sys_getpgid") > +int benchmark(void *ctx) > +{ > + u32 key = bpf_get_prandom_u32(); > + u64 init_val = 1; > + > + bpf_map_update_elem(&hash_map_bench, &key, &init_val, BPF_ANY); > + return 0; > +} This benchmark is artificial at its extreme. First it populates the map till max_entries and then constantly bounces off the max_entries limit in a bpf prog. Sometimes random_u32 will be less than max_entries and map_update_elem will hit a fast path, but most of the time it will fail to alloc_htab_elem() and will fail to map_update_elem. It does demonstrate that percpu_free_list is inefficient when it's empty, but there is no way such a microbenchmark justifies optimizing this corner case. If there is a production use case please code it up in a benchmark. Also there is a lot of other overhead: syscall and atomic-s. To stress map_update_elem please use a for() loop inside bpf prog.