Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 12/15/2023 3:15 AM, John Fastabend wrote:
> Alexei Starovoitov wrote:
>> On Wed, Dec 13, 2023 at 11:31 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>>> Hi,
>>>
>>> On 12/14/2023 2:22 PM, John Fastabend wrote:
>>>> Hou Tao wrote:
>>>>> From: Hou Tao <houtao1@xxxxxxxxxx>
>>>>>
>>>>> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
>>>>> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
>>>>> callbacks.
>>>>>
>>>>> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need
>>>>> rcu-read-lock because array->ptrs must still be allocated. For
>>>>> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires
>>>>> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use
>>>>> rcu_read_lock() during the invocation of htab_map_update_elem().
>>>>>
>>>>> Acked-by: Yonghong Song <yonghong.song@xxxxxxxxx>
>>>>> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
>>>>> ---
>>>>>  kernel/bpf/hashtab.c | 6 ++++++
>>>>>  kernel/bpf/syscall.c | 4 ----
>>>>>  2 files changed, 6 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>>>> index 5b9146fa825f..ec3bdcc6a3cf 100644
>>>>> --- a/kernel/bpf/hashtab.c
>>>>> +++ b/kernel/bpf/hashtab.c
>>>>> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
>>>>>      if (IS_ERR(ptr))
>>>>>              return PTR_ERR(ptr);
>>>>>
>>>>> +    /* The htab bucket lock is always held during update operations in fd
>>>>> +     * htab map, and the following rcu_read_lock() is only used to avoid
>>>>> +     * the WARN_ON_ONCE in htab_map_update_elem().
>>>>> +     */
> Ah ok but isn't this comment wrong because you do need rcu read lock to do
> the walk with lookup_nulls_elem_raw where there is no lock being held? And
> then the subsequent copy in place is fine because you do have a lock.
>
> So its not just to appease the WARN_ON_ONCE here it has an actual real
> need?
>
>>>>> +    rcu_read_lock();
>>>>>      ret = htab_map_update_elem(map, key, &ptr, map_flags);
>>>>> +    rcu_read_unlock();
>>>> Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It
>>>> looks like there are two ways to get to htab_map_update_elem() either
>>>> through a syscall and the path here (bpf_fd_htab_map_update_elem) or
>>>> through a BPF program calling, bpf_update_elem()? In the BPF_CALL
>>>> case bpf_map_update_elem() already has,
>>>>
>>>>    WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held())
>>>>
>>>> The htab_map_update_elem() has an additional check for
>>>> rcu_read_lock_trace_held(), but not sure where this is coming from
>>>> at the moment. Can that be added to the BPF caller side if needed?
>>>>
>>>> Did I miss some caller path?
>>> No. But I think the main reason for the extra WARN in
>>> bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by
>>> verifier in do_misc_fixups(), so the WARN_ON_ONCE in
>>> bpf_map_update_elem() will not be invoked ever. For
>>> rcu_read_lock_trace_held(), I have added the assertion in
>>> bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check
>>> rcu_read_lock_trace_held() before calling bpf map helpers").
>> Yep.
>> We should probably remove WARN_ONs from
>> bpf_map_update_elem() and others in kernel/bpf/helpers.c
>> since they are inlined by the verifier with 99% probability
>> and the WARNs are never called even in DEBUG kernels.
>> And confusing developers. As this thread shows.
> Agree. The rcu_read needs to be close as possible to where its actually
> needed and the WARN_ON_ONCE should be dropped if its going to be
> inlined.

I did some investigation on these bpf map helpers and the
implementations of these helpers in various kinds of bpf map. It seems
most implementations (besides dev_map_hash_ops) already have added
proper RCU lock assertions, so I think it is indeed OK to remove
WARN_ON_ONCE() on these bpf map helpers after fixing the assertion in
dev_map_hash_ops. The following is the details:

1. bpf_map_lookup_elem helper
(a) hash/lru_hash/percpu_hash/lru_percpu_hash
with !rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
!rcu_read_lock_bh_held() in __htab_map_lookup_elem()

(b) array/percpu_array
no deletion, so no RCU

(c) lpm_trie
with rcu_read_lock_bh_held() in trie_lookup_elem()

(d) htab_of_maps
with !rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
!rcu_read_lock_bh_held() in __htab_map_lookup_elem()

(e) array_of_maps
no deletion, so no RCU

(f) sockmap
rcu_read_lock_held() in __sock_map_lookup_elem()

(g) sockhash
rcu_read_lock_held() in__sock_hash_lookup_elem()

(h) devmap
rcu_read_lock_bh_held() in __dev_map_lookup_elem()

(i) devmap_hash (incorrect assertion)
No rcu_read_lock_bh_held() in __dev_map_hash_lookup_elem()

(j) xskmap
rcu_read_lock_bh_held() in __xsk_map_lookup_elem()

2. bpf_map_update_elem helper
(a) hash/lru_hash/percpu_hash/lru_percpu_hash
with !rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
!rcu_read_lock_bh_held() in
htab_map_update_elem()/htab_lru_map_update_elem()/__htab_percpu_map_update_elem()/__htab_lru_percpu_map_update_elem()

(b) array/percpu_array
no RCU

(c) lpm_trie
use spin-lock, and no RCU

(d) sockmap
use spin-lock & with rcu_read_lock_held() in sock_map_update_common()

(e) sockhash
use spin-lock & with rcu_read_lock_held() in sock_hash_update_common()

3.bpf_map_delete_elem helper
(a) hash/lru_hash/percpu_hash/lru_percpu_hash
with !rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
!rcu_read_lock_bh_held() () in htab_map_delete_elem/htab_lru_map_delete_elem

(b) array/percpu_array
no support

(c) lpm_trie
use spin-lock, no rcu

(d) sockmap
use spin-lock

(e) sockhash
use spin-lock

4. bpf_map_lookup_percpu_elem
(a) percpu_hash/lru_percpu_hash
with !rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
!rcu_read_lock_bh_held() in __htab_map_lookup_elem()

(b) percpu_array
no deletion, no RCU

>> We can replace them with a comment that explains this inlining logic
>> and where the real WARNs are..





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux