On Wed, Aug 23, 2023 at 1:29 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > > On 8/23/23 9:20 AM, Alexei Starovoitov wrote: > > On Tue, Aug 22, 2023 at 11:26 PM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > >> > >> > >> > >> On 8/21/23 12:33 PM, Dave Marchevsky wrote: > >>> This is the final fix for the use-after-free scenario described in > >>> commit 7793fc3babe9 ("bpf: Make bpf_refcount_acquire fallible for > >>> non-owning refs"). That commit, by virtue of changing > >>> bpf_refcount_acquire's refcount_inc to a refcount_inc_not_zero, fixed > >>> the "refcount incr on 0" splat. The not_zero check in > >>> refcount_inc_not_zero, though, still occurs on memory that could have > >>> been free'd and reused, so the commit didn't properly fix the root > >>> cause. > >>> > >>> This patch actually fixes the issue by free'ing using the recently-added > >>> bpf_mem_free_rcu, which ensures that the memory is not reused until > >>> RCU grace period has elapsed. If that has happened then > >>> there are no non-owning references alive that point to the > >>> recently-free'd memory, so it can be safely reused. > >>> > >>> Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx> > >>> Acked-by: Yonghong Song <yonghong.song@xxxxxxxxx> > >>> --- > >>> kernel/bpf/helpers.c | 6 +++++- > >>> 1 file changed, 5 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > >>> index eb91cae0612a..945a85e25ac5 100644 > >>> --- a/kernel/bpf/helpers.c > >>> +++ b/kernel/bpf/helpers.c > >>> @@ -1913,7 +1913,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec) > >>> > >>> if (rec) > >>> bpf_obj_free_fields(rec, p); > >> > >> During reviewing my percpu kptr patch with link > >> > >> https://lore.kernel.org/bpf/20230814172809.1361446-1-yonghong.song@xxxxxxxxx/T/#m2f7631b8047e9f5da60a0a9cd8717fceaf1adbb7 > >> Kumar mentioned although percpu memory itself is freed under rcu. > >> But its record fields are freed immediately. This will cause > >> the problem since there may be some active uses of these fields > >> within rcu cs and after bpf_obj_free_fields(), some fields may > >> be re-initialized with new memory but they do not have chances > >> to free any more. > >> > >> Do we have problem here as well? > > > > I think it's not an issue here or in your percpu patch, > > since bpf_obj_free_fields() calls __bpf_obj_drop_impl() which will > > call bpf_mem_free_rcu() (after this patch set lands). > > The following is my understanding. > > void bpf_obj_free_fields(const struct btf_record *rec, void *obj) > { > const struct btf_field *fields; > int i; > > if (IS_ERR_OR_NULL(rec)) > return; > fields = rec->fields; > for (i = 0; i < rec->cnt; i++) { > struct btf_struct_meta *pointee_struct_meta; > const struct btf_field *field = &fields[i]; > void *field_ptr = obj + field->offset; > void *xchgd_field; > > switch (fields[i].type) { > case BPF_SPIN_LOCK: > break; > case BPF_TIMER: > bpf_timer_cancel_and_free(field_ptr); > break; > case BPF_KPTR_UNREF: > WRITE_ONCE(*(u64 *)field_ptr, 0); > break; > case BPF_KPTR_REF: > ...... > break; > case BPF_LIST_HEAD: > if (WARN_ON_ONCE(rec->spin_lock_off < 0)) > continue; > bpf_list_head_free(field, field_ptr, obj + > rec->spin_lock_off); > break; > case BPF_RB_ROOT: > if (WARN_ON_ONCE(rec->spin_lock_off < 0)) > continue; > bpf_rb_root_free(field, field_ptr, obj + > rec->spin_lock_off); > break; > case BPF_LIST_NODE: > case BPF_RB_NODE: > case BPF_REFCOUNT: > break; > default: > WARN_ON_ONCE(1); > continue; > } > } > } > > For percpu kptr, the remaining possible actiionable fields are > BPF_LIST_HEAD and BPF_RB_ROOT > > So BPF_LIST_HEAD and BPF_RB_ROOT will try to go through all > list/rb nodes to unlink them from the list_head/rb_root. > > So yes, rb_nodes and list nodes will call __bpf_obj_drop_impl(). > Depending on whether the correspondingrec > with rb_node/list_node has ref count or not, > it may call bpf_mem_free() or bpf_mem_free_rcu(). If > bpf_mem_free() is called, then the field is immediately freed > but it may be used by some bpf prog (under rcu) concurrently, > could this be an issue? I see. Yeah. Looks like percpu makes such fields refcount-like. For non-percpu non-refcount only one bpf prog on one cpu can observe that object. That's why we're doing plain bpf_mem_free() for them. So this patch is a good fix for refcounted, but you and Kumar are correct that it's not sufficient for the case when percpu struct includes multiple rb_roots. One for each cpu. > Changing bpf_mem_free() in > __bpf_obj_drop_impl() to bpf_mem_free_rcu() should fix this problem. I guess we can do that when obj is either refcount or can be insider percpu, but it might not be enough. More below. > Another thing is related to list_head/rb_root. > During bpf_obj_free_fields(), is it possible that another cpu > may allocate a list_node/rb_node and add to list_head/rb_root? It's not an issue for the single owner case and for refcounted. Access to rb_root/list_head is always lock protected. For refcounted the obj needs to be acquired (from the verifier pov) meaning to have refcount =1 to be able to do spin_lock and operate on list_head. But bpf_rb_root_free is indeed an issue for percpu, since each percpu has its own rb root field with its own bpf_spinlock, but for_each_cpu() {bpf_obj_free_fields();} violates access contract. percpu and rb_root creates such a maze of dependencies that I think it's better to disallow rb_root-s and kptr-s inside percpu for now. > If this is true, then we might have a memory leak. > But I don't whether this is possible or not. > > I think local kptr has the issue as percpu kptr. Let's tackle one at a time. I still think Dave's patch set is a good fix for recounted, while we need to think more about percpu case.