Re: [RFC bpf-next v2 4/4] bpf: Introduce BPF_MA_REUSE_AFTER_RCU_GP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Apr 23, 2023 at 03:41:05PM +0800, Hou Tao wrote:
> >>
> >> (3) reuse-after-rcu-gp bpf memory allocator
> > that's the one you're implementing below, right?
> Right.
> >
> >> | name                | loop (k/s) | average memory (MiB) | peak memory (MiB) |
> >> | --                  | --         | --                   | --                |
> >> | no_op               | 1276       | 0.96                 | 1.00              |
> >> | overwrite           | 15.66      | 25.00                | 33.07             |
> >> | batch_add_batch_del | 10.32      | 18.84                | 22.64             |
> >> | add_del_on_diff_cpu | 13.00      | 550.50               | 748.74            |
> >>
> >> (4) free-after-rcu-gp bpf memory allocator (free directly through call_rcu)
> > What do you mean? htab uses bpf_ma, but does call_rcu before doing bpf_mem_free ?
> No, there is no call_rcu() before bpf_mem_free(). bpf_mem_free() in
> free-after-rcu-gp flavor will do call_rcu() in batch to free these elements back
> to slab subsystem directly. The elements in this flavor of bpf_ma is not safe
> for access from sleepable program except bpf_rcu_read_{lock,unlock}() are used.
> 
> But I think using call_rcu() to call bpf_mem_free() is good candidate for
> comparison and I saw bpf_cpumask does that, so I modify bpf hash table to do the
> similar thing and paste the benchmark result. As we can seen from the result,
> the memory usage for such flavor is much bigger than reuse-after-rcu-gp and
> free-after-rcu-gp:

I don't follow what exactly you're doing and what you're measuring.
Please provide patches for both reuse-after-rcu-gp and free-after-rcu-gp to
have meaningful conversation.
Rigth now we're stuck at what bench tool is actually measuring.

> >> +		if (try_queue_work && !work_pending(&c->reuse_work)) {
> >> +			/* Use reuse_cb_in_progress to indicate there is
> >> +			 * inflight reuse kworker or reuse RCU callback.
> >> +			 */
> >> +			atomic_inc(&c->reuse_cb_in_progress);
> >> +			/* Already queued */
> >> +			if (!queue_work(bpf_ma_wq, &c->reuse_work))
> > how many kthreads are spawned by wq in the peak?
> I think it depends on the number of bpf_ma. Because bpf_ma_wq is per-CPU
> workqueue, so for each bpf_ma, there is at most one worker for each CPU. And now
> the limit for the number of active workers on each CPU is 256, but it is
> customizable through alloc_workqueue() API.

Which means that on 8 cpu system there will be 8 * 256 kthreads ?
That's a lot. Please provide num_of_all_threads before/after/at_peak during bench.

Pls trim your replies. Mailers like mutt have a hard time navigating.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux