Hi Alexei, On 11/21/2023 1:19 PM, Alexei Starovoitov wrote: > On Mon, Nov 13, 2023 at 08:33:23PM +0800, Hou Tao wrote: >> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c >> index e2d2701ce2c45..5a7906f2b027e 100644 >> --- a/kernel/bpf/syscall.c >> +++ b/kernel/bpf/syscall.c >> @@ -694,12 +694,20 @@ static void bpf_map_free_deferred(struct work_struct *work) >> { >> struct bpf_map *map = container_of(work, struct bpf_map, work); >> struct btf_record *rec = map->record; >> + int acc_ctx; >> >> security_bpf_map_free(map); >> bpf_map_release_memcg(map); >> >> - if (READ_ONCE(map->free_after_mult_rcu_gp)) >> - synchronize_rcu_mult(call_rcu, call_rcu_tasks_trace); > The previous patch 3 is doing too much. > There is maybe_wait_bpf_programs() that will do synchronize_rcu() > when necessary. > The patch 3 could do synchronize_rcu_tasks_trace() only and it will solve the issue. I didn't follow how synchronize_rcu() in maybe_wait_bpf_programs() will help bpf_map_free_deferred() to defer the free of inner map. Could you please elaborate on that ? In my understanding, bpf_map_update_value() invokes maybe_wait_bpf_programs() after the deletion of old inner map from outer map completes. If the ref-count of inner map in the outer map is the last one, bpf_map_free_deferred() will be called when the deletion completes, so maybe_wait_bpf_programs() will run concurrently with bpf_map_free_deferred(). > >> + acc_ctx = atomic_read(&map->may_be_accessed_prog_ctx) & BPF_MAP_ACC_PROG_CTX_MASK; >> + if (acc_ctx) { >> + if (acc_ctx == BPF_MAP_ACC_NORMAL_PROG_CTX) >> + synchronize_rcu(); >> + else if (acc_ctx == BPF_MAP_ACC_SLEEPABLE_PROG_CTX) >> + synchronize_rcu_tasks_trace(); >> + else >> + synchronize_rcu_mult(call_rcu, call_rcu_tasks_trace); > and this patch 4 goes to far. > Could you add sleepable_refcnt in addition to existing refcnt that is incremented > in outer map when it's used by sleepable prog and when sleepable_refcnt > 0 > the caller of bpf_map_free_deferred sets free_after_mult_rcu_gp. > (which should be renamed to free_after_tasks_rcu_gp). > Patch 3 is simpler and patch 4 is simple too. > No need for atomic_or games. > > In addition I'd like to see an extra patch that demonstrates this UAF > when update/delete is done by syscall bpf prog type. > The test case in patch 5 is doing update/delete from user space. Do you mean update/delete operations on outer map, right ? Because in patch 5, inner map is updated from bpf program instead of user space. > If that was the only issue we could have easily extended maybe_wait_bpf_programs() > to do synchronize_rcu_tasks_trace() and that would close the issue exposed by patch 5. > But inner maps can indeed be updated by syscall bpf prog and since they run > under rcu_read_lock_trace() we cannot add synchronize_rcu_tasks_trace() to > maybe_wait_bpf_programs() because it will deadlock. > So let's make sure we have test cases for all combinations where inner maps > can be updated/deleted.