Re: [PATCH bpf v2 4/5] bpf: Optimize the free of inner map

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alexei,

On 11/21/2023 1:19 PM, Alexei Starovoitov wrote:
> On Mon, Nov 13, 2023 at 08:33:23PM +0800, Hou Tao wrote:
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index e2d2701ce2c45..5a7906f2b027e 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -694,12 +694,20 @@ static void bpf_map_free_deferred(struct work_struct *work)
>>  {
>>  	struct bpf_map *map = container_of(work, struct bpf_map, work);
>>  	struct btf_record *rec = map->record;
>> +	int acc_ctx;
>>  
>>  	security_bpf_map_free(map);
>>  	bpf_map_release_memcg(map);
>>  
>> -	if (READ_ONCE(map->free_after_mult_rcu_gp))
>> -		synchronize_rcu_mult(call_rcu, call_rcu_tasks_trace);
> The previous patch 3 is doing too much.
> There is maybe_wait_bpf_programs() that will do synchronize_rcu()
> when necessary.
> The patch 3 could do synchronize_rcu_tasks_trace() only and it will solve the issue.

I didn't follow how synchronize_rcu() in maybe_wait_bpf_programs() will
help bpf_map_free_deferred() to defer the free of inner map. Could you
please elaborate on that ? In my understanding, bpf_map_update_value()
invokes maybe_wait_bpf_programs() after the deletion of old inner map
from outer map completes. If the ref-count of inner map in the outer map
is the last one, bpf_map_free_deferred() will be called when the
deletion completes, so maybe_wait_bpf_programs() will run concurrently
with bpf_map_free_deferred().
>
>> +	acc_ctx = atomic_read(&map->may_be_accessed_prog_ctx) & BPF_MAP_ACC_PROG_CTX_MASK;
>> +	if (acc_ctx) {
>> +		if (acc_ctx == BPF_MAP_ACC_NORMAL_PROG_CTX)
>> +			synchronize_rcu();
>> +		else if (acc_ctx == BPF_MAP_ACC_SLEEPABLE_PROG_CTX)
>> +			synchronize_rcu_tasks_trace();
>> +		else
>> +			synchronize_rcu_mult(call_rcu, call_rcu_tasks_trace);
> and this patch 4 goes to far.
> Could you add sleepable_refcnt in addition to existing refcnt that is incremented
> in outer map when it's used by sleepable prog and when sleepable_refcnt > 0
> the caller of bpf_map_free_deferred sets free_after_mult_rcu_gp.
> (which should be renamed to free_after_tasks_rcu_gp).
> Patch 3 is simpler and patch 4 is simple too.
> No need for atomic_or games.
>
> In addition I'd like to see an extra patch that demonstrates this UAF
> when update/delete is done by syscall bpf prog type.
> The test case in patch 5 is doing update/delete from user space.

Do you mean update/delete operations on outer map, right ? Because in
patch 5, inner map is updated from bpf program instead of user space.
> If that was the only issue we could have easily extended maybe_wait_bpf_programs()
> to do synchronize_rcu_tasks_trace() and that would close the issue exposed by patch 5.
> But inner maps can indeed be updated by syscall bpf prog and since they run
> under rcu_read_lock_trace() we cannot add synchronize_rcu_tasks_trace() to
> maybe_wait_bpf_programs() because it will deadlock.
> So let's make sure we have test cases for all combinations where inner maps
> can be updated/deleted.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux