Re: [PATCH v4 bpf-next 12/14] bpf: Introduce bpf_mem_free_rcu() similar to kfree_rcu().

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Fri, 7 Jul 2023 12:05:18 +0800

Hi,

On 7/7/2023 10:10 AM, Alexei Starovoitov wrote:
> On Thu, Jul 6, 2023 at 6:45 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>>
>>
>> On 7/6/2023 11:34 AM, Alexei Starovoitov wrote:
>>> From: Alexei Starovoitov <ast@xxxxxxxxxx>
>>>
>>> Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu().
>>> Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into
>>> per-cpu free list the _rcu() flavor waits for RCU grace period and then moves
>>> objects into free_by_rcu_ttrace list where they are waiting for RCU
>>> task trace grace period to be freed into slab.
>>>
>>> The life cycle of objects:
>>> alloc: dequeue free_llist
>>> free: enqeueu free_llist
>>> free_rcu: enqueue free_by_rcu -> waiting_for_gp
>>> free_llist above high watermark -> free_by_rcu_ttrace
>>> after RCU GP waiting_for_gp -> free_by_rcu_ttrace
>>> free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab
>>>
>>> Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
>> Acked-by: Hou Tao <houtao1@xxxxxxxxxx>
> Thank you very much for code reviews and feedback.

You are welcome. I also learn a lot from this great patch set.

>
> btw I still believe that ABA is a non issue and prefer to keep the code as-is,
> but for the sake of experiment I've converted it to spin_lock
> (see attached patch which I think uglifies the code)
> and performance across bench htab-mem and map_perf_test
> seems to be about the same.
> Which was a bit surprising to me.
> Could you please benchmark it on your system?

Will do that later. It seems if there is no cross-CPU allocation and
free, the only possible contention is between __free_rcu() on CPU x and
alloc_bulk()/free_bulk() on a different CPU.