Re: [RFC bpf-next v2 1/4] selftests/bpf: Add benchmark for bpf memory allocator

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Thu, 27 Apr 2023 06:46:27 -0700

On Wed, Apr 26, 2023 at 09:20:49PM -0700, Alexei Starovoitov wrote:
> On Sun, Apr 23, 2023 at 09:55:24AM +0800, Hou Tao wrote:
> > >
> > >> ./bench htab-mem --use-case $name --max-entries 16384 \
> > >> 	--full 50 -d 7 -w 3 --producers=8 --prod-affinity=0-7
> > >>
> > >> | name                | loop (k/s) | average memory (MiB) | peak memory (MiB) |
> > >> | --                  | --         | --                   | --                |
> > >> | no_op               | 1129       | 1.15                 | 1.15              |
> > >> | overwrite           | 24.37      | 2.07                 | 2.97              |
> > >> | batch_add_batch_del | 10.58      | 2.91                 | 3.36              |
> > >> | add_del_on_diff_cpu | 13.14      | 380.66               | 633.99            |
> > > large mem for diff_cpu case needs to be investigated.
> > The main reason is that tasks-trace RCU GP is slow and there is only one
> > inflight free callback, so the CPUs which only do element addition will allocate
> > new memory from slab continuously and the CPUs which only do element deletion
> > will free these elements continuously through call_tasks_trace_rcu(), but due to
> > the slowness of tasks-trace RCU GP, these freed elements could not be freed back
> > to slab subsystem timely.
> 
> I see. Now it makes sense. It's slow call_tasks_trace_rcu and not at all "memory can never be reused."
> Please explain things clearly in commit log.

Is this a benchmarking issue, or is this happening in real workloads?

If the former, one trick I use in rcutorture's callback-flooding code is
to pass the ready-to-be-freed memory directly back to the allocating CPU.
Which might be what you were getting at with your "maybe stealing from
free_list of other CPUs".

If this is happening in real workloads, then I would like to better
understand that workload.

							Thanx, Paul