Hou Tao wrote: > From: Hou Tao <houtao1@xxxxxxxxxx> > > The benchmark could be used to compare the performance of hash map > operations and the memory usage between different flavors of bpf memory > allocator (e.g., no bpf ma vs bpf ma vs reuse-after-gp bpf ma). It also > could be used to check the performance improvement or the memory saving > provided by optimization. > > The benchmark creates a non-preallocated hash map which uses bpf memory > allocator and shows the operation performance and the memory usage of > the hash map under different use cases: > (1) overwrite > Each CPU overwrites nonoverlapping part of hash map. When each CPU > completes overwriting of 64 elements in hash map, it increases the > op_count. > (2) batch_add_batch_del > Each CPU adds then deletes nonoverlapping part of hash map in batch. > When each CPU adds and deletes 64 elements in hash map, it increases > the op_count twice. > (3) add_del_on_diff_cpu > Each two-CPUs pair adds and deletes nonoverlapping part of map > cooperatively. When each CPU adds or deletes 64 elements in hash map, > it will increase the op_count. > > The following is the benchmark results when comparing between different > flavors of bpf memory allocator. These tests are conducted on a KVM guest > with 8 CPUs and 16 GB memory. The command line below is used to do all > the following benchmarks: > > ./bench htab-mem --use-case $name ${OPTS} -w3 -d10 -a -p8 > > These results show that preallocated hash map has both better performance > and smaller memory footprint. > > (1) non-preallocated + no bpf memory allocator (v6.0.19) > use kmalloc() + call_rcu > > overwrite per-prod-op: 11.24 ± 0.07k/s, avg mem: 82.64 ± 26.32MiB, peak mem: 119.18MiB > batch_add_batch_del per-prod-op: 18.45 ± 0.10k/s, avg mem: 50.47 ± 14.51MiB, peak mem: 94.96MiB > add_del_on_diff_cpu per-prod-op: 14.50 ± 0.03k/s, avg mem: 4.64 ± 0.73MiB, peak mem: 7.20MiB > > (2) preallocated > OPTS=--preallocated > > overwrite per-prod-op: 191.92 ± 0.07k/s, avg mem: 1.23 ± 0.00MiB, peak mem: 1.49MiB > batch_add_batch_del per-prod-op: 218.10 ± 0.25k/s, avg mem: 1.23 ± 0.00MiB, peak mem: 1.49MiB > add_del_on_diff_cpu per-prod-op: 39.59 ± 0.41k/s, avg mem: 1.48 ± 0.11MiB, peak mem: 1.74MiB > > (3) normal bpf memory allocator > > overwrite per-prod-op: 134.81 ± 0.22k/s, avg mem: 1.67 ± 0.12MiB, peak mem: 2.74MiB > batch_add_batch_del per-prod-op: 90.44 ± 0.34k/s, avg mem: 2.27 ± 0.00MiB, peak mem: 2.74MiB > add_del_on_diff_cpu per-prod-op: 28.20 ± 0.15k/s, avg mem: 1.73 ± 0.17MiB, peak mem: 2.06MiB Acked-by: John Fastabend <john.fastabend@xxxxxxxxx> > + > +static error_t htab_mem_parse_arg(int key, char *arg, struct argp_state *state) > +{ > + switch (key) { > + case ARG_VALUE_SIZE: > + args.value_size = strtoul(arg, NULL, 10); > + if (args.value_size > 4096) { > + fprintf(stderr, "too big value size %u\n", args.value_size); > + argp_usage(state); > + } > + break; > + case ARG_USE_CASE: > + args.use_case = strdup(arg); might be worth checking for null and returning an error? Only matters if we run from CI or something and then this looks like a flake. > + break; > + case ARG_PREALLOCATED: > + args.preallocated = true; > + break; > + default: > + return ARGP_ERR_UNKNOWN; > + } > + > + return 0; > +}