Re: [PATCH bpf 05/11] bpf: Add bpf_map_of_map_fd_{get,put}_ptr() helpers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 11/10/2023 12:58 PM, Paul E. McKenney wrote:
> On Fri, Nov 10, 2023 at 11:34:03AM +0800, Hou Tao wrote:
>> Hi Martin,
>>
>> On 11/10/2023 10:48 AM, Martin KaFai Lau wrote:
>>> On 11/9/23 5:45 PM, Paul E. McKenney wrote:
>>>>>>>>>> +void bpf_map_of_map_fd_put_ptr(void *ptr, bool need_defer)
>>>>>>>>>> +{
>>>>>>>>>> +    struct bpf_inner_map_element *element = ptr;
>>>>>>>>>> +
>>>>>>>>>> +    /* Do bpf_map_put() after a RCU grace period and a tasks
>>>>>>>>>> trace
>>>>>>>>>> +     * RCU grace period, so it is certain that the bpf program
>>>>>>>>>> which is
>>>>>>>>>> +     * manipulating the map now has exited when bpf_map_put() is
>>>>>>>>>> called.
>>>>>>>>>> +     */
>>>>>>>>>> +    if (need_defer)
>>>>>>>>> "need_defer" should only happen from the syscall cmd? Instead of
>>>>>>>>> adding rcu_head to each element, how about
>>>>>>>>> "synchronize_rcu_mult(call_rcu, call_rcu_tasks)" here?
>>>>>>>> No. I have tried the method before, but it didn't work due to
>>>>>>>> dead-lock
>>>>>>>> (will mention that in commit message in v2). The reason is that bpf
>>>>>>>> syscall program may also do map update through sys_bpf helper.
>>>>>>>> Because
>>>>>>>> bpf syscall program is running with sleep-able context and has
>>>>>>>> rcu_read_lock_trace being held, so call
>>>>>>>> synchronize_rcu_mult(call_rcu,
>>>>>>>> call_rcu_tasks) will lead to dead-lock.
>>> Need to think of a less intrusive solution instead of adding rcu_head
>>> to each element and lookup also needs an extra de-referencing.
>> I see.
>>> May be the bpf_map_{update,delete}_elem(&outer_map, ....) should not
>>> be done by the syscall program? Which selftest does it?
>> Now bpf_map_update_elem is allowed for bpf_sys_bpf helper. If I
>> remembered correctly it was map_ptr.
>>> Can the inner_map learn that it has been deleted from an outer map
>>> that is used in a sleepable prog->aux->used_maps? The
>>> bpf_map_free_deferred() will then wait for a task_trace gp?
>> Considering an inner_map may be used by multiple outer_map, the
>> following solution will be simpler: if the inner map has been deleted
>> from an outer map once, its free must be delayed after one RCU GP and
>> one tasks trace RCU GP. But I will check whether it is possible to only
>> wait for one RCU GP instead of two.
> If you are freeing a large quantity of elements at a time, one approach
> is to use a single rcu_head structure for the group.  (Or, in this case,
> maybe a pair of rcu_head structures, one for call_rcu() and the other
> for call_rcu_tasks_trace().)
>
> This requires that you be able to link the elements in the group
> together somehow, which requires some per-element storage, but only
> one word per element instead of two.
>
> There are other variations on this theme, depending on what constraints
> apply here.

Thanks for your suggestions. Although there are batch update support for
inner map, but I think inner map is updated one-by-one at most case. And
the main concern here is the extra dereference due to memory allocation,
so I think adding extra flags to indicate bpf_mem_free_deferred() to
free the map differently may be appropriate.

Regards,
Tao
> 							Thanx, Paul





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux