Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Thu, 13 Oct 2022 12:04:57 -0700

On Thu, Oct 13, 2022 at 09:41:46AM +0800, Hou Tao wrote:
> Hi,
> 
> On 10/13/2022 12:11 AM, Paul E. McKenney wrote:
> > On Wed, Oct 12, 2022 at 05:26:26PM +0800, Hou Tao wrote:
> >> Hi,
> >>
> >> On 10/12/2022 2:36 PM, Paul E. McKenney wrote:
> SNIP
> >> As said above, for bpf memory allocator it may be OK because it frees elements
> >> in batch, but for bpf local storage and its element these memories are freed
> >> individually. So I think if the implication of RCU tasks trace grace period will
> >> not be changed in the foreseeable future, adding rcu_trace_implies_rcu_gp() and
> >> using it in bpf is a good idea. What do you think ?
> > Maybe the BPF memory allocator does it one way and BPF local storage
> > does it another way.
> 
> Another question. Just find out that there are new APIs for RCU polling (e.g.
> get_state_synchronize_rcu_full()). According to comments, the advantage of new
> API is that it will never miss a passed grace period. So for this case is
> get_state_synchronize_rcu() enough ? Or should I switch to use
> get_state_synchronize_rcu_full() instead ?

I suggest starting with get_state_synchronize_rcu(), and moving to the
_full() variants only if experience shows that it is necessary.

Please note that these functions work with normal RCU, that is,
call_rcu(), but not call_rcu_tasks(), call_rcu_tasks_trace(), or
call_rcu_rude().  Please note also that SRCU has its own set of polling
APIs, for example, get_state_synchronize_srcu().

								Thanx, Paul

> Regards
> > How about if I produce a patch for rcu_trace_implies_rcu_gp() and let
> > you carry it with your series?  That way I don't have an unused function
> > in -rcu and you don't have to wait for me to send it upstream?
> >
> > 							Thanx, Paul
> >
> >>>>>> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
> >>>>>> ---
> >>>>>>  kernel/bpf/memalloc.c | 17 ++++++-----------
> >>>>>>  1 file changed, 6 insertions(+), 11 deletions(-)
> >>>>>>
> >>>>>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> >>>>>> index 5f83be1d2018..6f32dddc804f 100644
> >>>>>> --- a/kernel/bpf/memalloc.c
> >>>>>> +++ b/kernel/bpf/memalloc.c
> >>>>>> @@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj)
> >>>>>>  	kfree(obj);
> >>>>>>  }
> >>>>>>  
> >>>>>> +/* Now RCU Tasks grace period implies RCU grace period, so no need to do
> >>>>>> + * an extra call_rcu().
> >>>>>> + */
> >>>>>>  static void __free_rcu(struct rcu_head *head)
> >>>>>>  {
> >>>>>>  	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> >>>>>> @@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head)
> >>>>>>  	atomic_set(&c->call_rcu_in_progress, 0);
> >>>>>>  }
> >>>>>>  
> >>>>>> -static void __free_rcu_tasks_trace(struct rcu_head *head)
> >>>>>> -{
> >>>>>> -	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
> >>>>>> -
> >>>>>> -	call_rcu(&c->rcu, __free_rcu);
> >>>>>> -}
> >>>>>> -
> >>>>>>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
> >>>>>>  {
> >>>>>>  	struct llist_node *llnode = obj;
> >>>>>> @@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c)
> >>>>>>  		 * from __free_rcu() and from drain_mem_cache().
> >>>>>>  		 */
> >>>>>>  		__llist_add(llnode, &c->waiting_for_gp);
> >>>>>> -	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
> >>>>>> -	 * Then use call_rcu() to wait for normal progs to finish
> >>>>>> -	 * and finally do free_one() on each element.
> >>>>>> +	/* Use call_rcu_tasks_trace() to wait for both sleepable and normal
> >>>>>> +	 * progs to finish and finally do free_one() on each element.
> >>>>>>  	 */
> >>>>>> -	call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
> >>>>>> +	call_rcu_tasks_trace(&c->rcu, __free_rcu);
> >>>>>>  }
> >>>>>>  
> >>>>>>  static void free_bulk(struct bpf_mem_cache *c)
> >>>>>> -- 
> >>>>>> 2.29.2
> >>>>>>
>