Hi, On 7/7/2023 10:10 AM, Alexei Starovoitov wrote: > On Thu, Jul 6, 2023 at 6:45 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> >> >> On 7/6/2023 11:34 AM, Alexei Starovoitov wrote: >>> From: Alexei Starovoitov <ast@xxxxxxxxxx> >>> >>> Introduce bpf_mem_[cache_]free_rcu() similar to kfree_rcu(). >>> Unlike bpf_mem_[cache_]free() that links objects for immediate reuse into >>> per-cpu free list the _rcu() flavor waits for RCU grace period and then moves >>> objects into free_by_rcu_ttrace list where they are waiting for RCU >>> task trace grace period to be freed into slab. >>> >>> The life cycle of objects: >>> alloc: dequeue free_llist >>> free: enqeueu free_llist >>> free_rcu: enqueue free_by_rcu -> waiting_for_gp >>> free_llist above high watermark -> free_by_rcu_ttrace >>> after RCU GP waiting_for_gp -> free_by_rcu_ttrace >>> free_by_rcu_ttrace -> waiting_for_gp_ttrace -> slab >>> >>> Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> >> Acked-by: Hou Tao <houtao1@xxxxxxxxxx> > Thank you very much for code reviews and feedback. You are welcome. I also learn a lot from this great patch set. > > btw I still believe that ABA is a non issue and prefer to keep the code as-is, > but for the sake of experiment I've converted it to spin_lock > (see attached patch which I think uglifies the code) > and performance across bench htab-mem and map_perf_test > seems to be about the same. > Which was a bit surprising to me. > Could you please benchmark it on your system? Will do that later. It seems if there is no cross-CPU allocation and free, the only possible contention is between __free_rcu() on CPU x and alloc_bulk()/free_bulk() on a different CPU.