Re: [PATCH bpf-next 1/3] bpf: Free elements after one RCU-tasks-trace grace period

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 10/11/2022 5:07 PM, Paul E. McKenney wrote:
> On Tue, Oct 11, 2022 at 03:11:26PM +0800, Hou Tao wrote:
>> From: Hou Tao <houtao1@xxxxxxxxxx>
>>
>> According to the implementation of RCU Tasks Trace, it inovkes
>> ->postscan_func() to wait for one RCU-tasks-trace grace period and
>> rcu_tasks_trace_postscan() inovkes synchronize_rcu() to wait for one
>> normal RCU grace period in turn, so one RCU-tasks-trace grace period
>> will imply one RCU grace period.
>>
>> So there is no need to do call_rcu() again in the callback of
>> call_rcu_tasks_trace() and it can just free these elements directly.
> This is true, but this is an implementation detail that is not guaranteed
> in future versions of the kernel.  But if this additional call_rcu()
> is causing trouble, I could add some API member that returned true in
> kernels where it does happen to be the case that call_rcu_tasks_trace()
> implies a call_rcu()-style grace period.
>
> The BPF memory allocator could then complain or adapt, as appropriate.
>
> Thoughts?
It is indeed an implementation details. But In an idle KVM guest, for memory
reclamation in bpf memory allocator a RCU tasks trace grace period is about 30ms
and RCU grace period is about 20 ms. Under stress condition, the grace period
will be much longer. If the extra RCU grace period can be removed, these memory
can be reclaimed more quickly and it will be beneficial for memory pressure.

Now it seems we can use RCU poll APIs (e.g. get_state_synchronize_rcu() and
poll_state_synchronize_rcu() ) to check whether or not a RCU grace period has
passed. But It needs to add at least one unsigned long into the freeing object.
The extra memory overhead may be OK for bpf memory allocator, but it is not for
small object. So could you please show example on how these new APIs work ? Does
it need to modify the to-be-free object ?
>
> 							Thanx, Paul
>
>> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
>> ---
>>  kernel/bpf/memalloc.c | 17 ++++++-----------
>>  1 file changed, 6 insertions(+), 11 deletions(-)
>>
>> diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
>> index 5f83be1d2018..6f32dddc804f 100644
>> --- a/kernel/bpf/memalloc.c
>> +++ b/kernel/bpf/memalloc.c
>> @@ -209,6 +209,9 @@ static void free_one(struct bpf_mem_cache *c, void *obj)
>>  	kfree(obj);
>>  }
>>  
>> +/* Now RCU Tasks grace period implies RCU grace period, so no need to do
>> + * an extra call_rcu().
>> + */
>>  static void __free_rcu(struct rcu_head *head)
>>  {
>>  	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>> @@ -220,13 +223,6 @@ static void __free_rcu(struct rcu_head *head)
>>  	atomic_set(&c->call_rcu_in_progress, 0);
>>  }
>>  
>> -static void __free_rcu_tasks_trace(struct rcu_head *head)
>> -{
>> -	struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu);
>> -
>> -	call_rcu(&c->rcu, __free_rcu);
>> -}
>> -
>>  static void enque_to_free(struct bpf_mem_cache *c, void *obj)
>>  {
>>  	struct llist_node *llnode = obj;
>> @@ -252,11 +248,10 @@ static void do_call_rcu(struct bpf_mem_cache *c)
>>  		 * from __free_rcu() and from drain_mem_cache().
>>  		 */
>>  		__llist_add(llnode, &c->waiting_for_gp);
>> -	/* Use call_rcu_tasks_trace() to wait for sleepable progs to finish.
>> -	 * Then use call_rcu() to wait for normal progs to finish
>> -	 * and finally do free_one() on each element.
>> +	/* Use call_rcu_tasks_trace() to wait for both sleepable and normal
>> +	 * progs to finish and finally do free_one() on each element.
>>  	 */
>> -	call_rcu_tasks_trace(&c->rcu, __free_rcu_tasks_trace);
>> +	call_rcu_tasks_trace(&c->rcu, __free_rcu);
>>  }
>>  
>>  static void free_bulk(struct bpf_mem_cache *c)
>> -- 
>> 2.29.2
>>




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux