Hi, On 10/11/2022 5:07 PM, Paul E. McKenney wrote: > On Tue, Oct 11, 2022 at 03:11:26PM +0800, Hou Tao wrote: >> From: Hou Tao <houtao1@xxxxxxxxxx> >> >> According to the implementation of RCU Tasks Trace, it inovkes >> ->postscan_func() to wait for one RCU-tasks-trace grace period and >> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one >> normal RCU grace period in turn, so one RCU-tasks-trace grace period >> will imply one RCU grace period. >> >> So there is no need to do call_rcu() again in the callback of >> call_rcu_tasks_trace() and it can just free these elements directly. > This is true, but this is an implementation detail that is not guaranteed > in future versions of the kernel. But if this additional call_rcu() > is causing trouble, I could add some API member that returned true in > kernels where it does happen to be the case that call_rcu_tasks_trace() > implies a call_rcu()-style grace period. > > The BPF memory allocator could then complain or adapt, as appropriate. > > Thoughts? It is indeed an implementation details. But In an idle KVM guest, for memory reclamation in bpf memory allocator a RCU tasks trace grace period is about 30ms and RCU grace period is about 20 ms. Under stress condition, the grace period will be much longer. If the extra RCU grace period can be removed, these memory can be reclaimed more quickly and it will be beneficial for memory pressure. Now it seems we can use RCU poll APIs (e.g. get_state_synchronize_rcu() and poll_state_synchronize_rcu() ) to check whether or not a RCU grace period has passed. But It needs to add at least one unsigned long into the freeing object. The extra memory overhead may be OK for bpf memory allocator, but it is not for small object. So could you please show example on how these new APIs work ? Does it need to modify the to-be-free object ? > > Thanx, Paul > >> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> >> --- >> kernel/bpf/memalloc.c | 17 ++++++----------- >> 1 file changed, 6 insertions(+), 11 deletions(-) >> >> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c >> index 5f83be1d2018..6f32dddc804f 100644 >> --- a/kernel/bpf/memalloc.c >> +++ b/kernel/bpf/memalloc.c >> @@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj) >> kfree(obj); >> } >> >> +/* Now RCU Tasks grace period implies RCU grace period, so no need to do >> + * an extra call_rcu(). >> + */ >> static void __free_rcu(struct rcu_head *head) >> { >> struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); >> @@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head) >> atomic_set(&c->call_rcu_in_progress, 0); >> } >> >> -static void __free_rcu_tasks_trace(struct rcu_head *head) >> -{ >> - struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu); >> - >> - call_rcu(&c->rcu, __free_rcu); >> -} >> - >> static void enque_to_free(struct bpf_mem_cache *c, void *obj) >> { >> struct llist_node *llnode = obj; >> @@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c) >> * from __free_rcu() and from drain_mem_cache(). >> */ >> __llist_add(llnode, &c->waiting_for_gp); >> - /* Use call_rcu_tasks_trace() to wait for sleepable progs to finish. >> - * Then use call_rcu() to wait for normal progs to finish >> - * and finally do free_one() on each element. >> + /* Use call_rcu_tasks_trace() to wait for both sleepable and normal >> + * progs to finish and finally do free_one() on each element. >> */ >> - call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace); >> + call_rcu_tasks_trace(&c->rcu, __free_rcu); >> } >> >> static void free_bulk(struct bpf_mem_cache *c) >> -- >> 2.29.2 >>