Re: [PATCH bpf-next v3 1/2] bpf: Reduce the scope of rcu_read_lock when updating fd map

Hou Tao <houtao@xxxxxxxxxxxxxxx> · Thu, 14 Dec 2023 15:31:11 +0800



Hi,

On 12/14/2023 2:22 PM, John Fastabend wrote:
> Hou Tao wrote:
>> From: Hou Tao <houtao1@xxxxxxxxxx>
>>
>> There is no rcu-read-lock requirement for ops->map_fd_get_ptr() or
>> ops->map_fd_put_ptr(), so doesn't use rcu-read-lock for these two
>> callbacks.
>>
>> For bpf_fd_array_map_update_elem(), accessing array->ptrs doesn't need
>> rcu-read-lock because array->ptrs must still be allocated. For
>> bpf_fd_htab_map_update_elem(), htab_map_update_elem() only requires
>> rcu-read-lock to be held to avoid the WARN_ON_ONCE(), so only use
>> rcu_read_lock() during the invocation of htab_map_update_elem().
>>
>> Acked-by: Yonghong Song <yonghong.song@xxxxxxxxx>
>> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
>> ---
>>  kernel/bpf/hashtab.c | 6 ++++++
>>  kernel/bpf/syscall.c | 4 ----
>>  2 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index 5b9146fa825f..ec3bdcc6a3cf 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -2523,7 +2523,13 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
>>  	if (IS_ERR(ptr))
>>  		return PTR_ERR(ptr);
>>  
>> +	/* The htab bucket lock is always held during update operations in fd
>> +	 * htab map, and the following rcu_read_lock() is only used to avoid
>> +	 * the WARN_ON_ONCE in htab_map_update_elem().
>> +	 */
>> +	rcu_read_lock();
>>  	ret = htab_map_update_elem(map, key, &ptr, map_flags);
>> +	rcu_read_unlock();
> Did we consider dropping the WARN_ON_ONCE in htab_map_update_elem()? It
> looks like there are two ways to get to htab_map_update_elem() either
> through a syscall and the path here (bpf_fd_htab_map_update_elem) or
> through a BPF program calling, bpf_update_elem()? In the BPF_CALL
> case bpf_map_update_elem() already has,
>
>    WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held())
>
> The htab_map_update_elem() has an additional check for
> rcu_read_lock_trace_held(), but not sure where this is coming from
> at the moment. Can that be added to the BPF caller side if needed?
>
> Did I miss some caller path?

No. But I think the main reason for the extra WARN in
bpf_map_update_elem() is that bpf_map_update_elem() may be inlined by
verifier in do_misc_fixups(), so the WARN_ON_ONCE in
bpf_map_update_elem() will not be invoked ever. For
rcu_read_lock_trace_held(), I have added the assertion in
bpf_map_delete_elem() recently in commit 169410eba271 ("bpf: Check
rcu_read_lock_trace_held() before calling bpf map helpers").
>  
>
>>  	if (ret)
>>  		map->ops->map_fd_put_ptr(map, ptr, false);
>>  
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index d63c1ed42412..3fcf7741146a 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -184,15 +184,11 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
>>  		err = bpf_percpu_cgroup_storage_update(map, key, value,
>>  						       flags);
>>  	} else if (IS_FD_ARRAY(map)) {
>> -		rcu_read_lock();
>>  		err = bpf_fd_array_map_update_elem(map, map_file, key, value,
>>  						   flags);
>> -		rcu_read_unlock();
>>  	} else if (map->map_type == BPF_MAP_TYPE_HASH_OF_MAPS) {
>> -		rcu_read_lock();
>>  		err = bpf_fd_htab_map_update_elem(map, map_file, key, value,
>>  						  flags);
>> -		rcu_read_unlock();
>>  	} else if (map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) {
>>  		/* rcu_read_lock() is not needed */
>>  		err = bpf_fd_reuseport_array_update_elem(map, key, value,
> Any reason to leave the last rcu_read_lock() on the 'else{}' case? If
> the rule is we have a reference to the map through the file fdget()? And
> any concurrent runners need some locking, xchg, to handle the update a
> rcu_read_lock() wont help there.
>
> I didn't audit all the update flows tonight though.

It seems it is still necessary for htab and local storage. For normal
htab, it is possible the update is done without taking the bucket lock
(in-place replace), so RCU CS is needed to guarantee the iteration is
still safe. And for local storage (e.g. cgrp local storage) it may also
do in-place update through lookup and then update. We could fold the
calling of rcu_read_lock() into .map_update_elem() if it is necessary.
>
>
>> -- 
>> 2.29.2
>>
>>