Hi, On 4/22/2023 10:59 AM, Alexei Starovoitov wrote: > On Sat, Apr 08, 2023 at 10:18:43PM +0800, Hou Tao wrote: >> From: Hou Tao <houtao1@xxxxxxxxxx> >> >> The benchmark could be used to compare the performance of hash map >> operations and the memory usage between different flavors of bpf memory >> allocator (e.g., no bpf ma vs bpf ma vs reuse-after-gp bpf ma). It also >> could be used to check the performance improvement or the memory saving >> of bpf memory allocator optimization and check whether or not a specific >> use case is suitable for bpf memory allocator. >> >> The benchmark creates a non-preallocated hash map which uses bpf memory >> allocator and shows the operation performance and the memory usage of >> the hash map under different use cases: >> (1) no_op >> Only create the hash map and there is no operations on hash map. It is >> used as the baseline. When each CPUs complete the iteartion of >> nonoverlapping part of hash map, the loop count is increased. >> (2) overwrite >> Each CPU overwrites nonoverlapping part of hash map. When each CPU >> completes one round of iteration, the loop count is increased. >> (3) batch_add_batch_del >> Each CPU adds then deletes nonoverlapping part of hash map in batch. >> When each CPU completes one round of iteration, the loop count is >> increased. >> (4) add_del_on_diff_cpu >> Each two CPUs add and delete nonoverlapping part of map concurrently. >> When each CPU completes one round of iteration, the loop count is >> increased. >> >> The following benchmark results show that bpf memory allocator doesn't >> handle add_del_on_diff_cpu scenario very well. Because map deletion >> always happen on a different CPU than the map addition and the freed >> memory can never be reused. SNIP >> + >> +SEC("?tp/syscalls/sys_enter_getpgid") >> +int add_del_on_diff_cpu(void *ctx) >> +{ >> + struct update_ctx update; >> + unsigned int from; >> + >> + from = bpf_get_smp_processor_id(); >> + update.from = from / 2; >> + update.step = nr_thread / 2; >> + update.max = nr_entries; >> + >> + if (from & 1) >> + bpf_loop(update.max, newwrite_htab, &update, 0); >> + else >> + bpf_loop(update.max, del_htab, &update, 0); > This is oddly shaped test. > deleter cpu may run ahead of newwrite_htab. > deleter will try to delete elems that don't exist. > Loop of few thousand iterations is not a lot for one cpu to run ahead. > > Each loop will run 16k times and every time you step += 4. > So 3/4 of these 16k runs it will be hitting if (ctx->from >= ctx->max) condition. > What are you measuring? I think it would be better to synchronize between deletion CPU and addition CPU. Will fix it. > >> + >> + __sync_fetch_and_add(&loop_cnt, 1); >> + return 0; >> +} >> -- >> 2.29.2 >> > .