On Thu, Dec 19, 2024 at 10:39 AM Yonghong Song <yonghong.song@xxxxxxxxx> wrote: > > > > > On 12/19/24 4:38 AM, Abel Wu wrote: > > Hi Yonghong, > > > > On 12/19/24 10:45 AM, Yonghong Song Wrote: > >> > >> > >> > >> On 12/18/24 1:21 AM, Abel Wu wrote: > >>> The following commit > >>> bc235cdb423a ("bpf: Prevent deadlock from recursive > >>> bpf_task_storage_[get|delete]") > >>> first introduced deadlock prevention for fentry/fexit programs > >>> attaching > >>> on bpf_task_storage helpers. That commit also employed the logic in map > >>> free path in its v6 version. > >>> > >>> Later bpf_cgrp_storage was first introduced in > >>> c4bcfb38a95e ("bpf: Implement cgroup storage available to > >>> non-cgroup-attached bpf progs") > >>> which faces the same issue as bpf_task_storage, instead of its busy > >>> counter, NULL was passed to bpf_local_storage_map_free() which opened > >>> a window to cause deadlock: > >>> > >>> <TASK> > >>> _raw_spin_lock_irqsave+0x3d/0x50 > >>> bpf_local_storage_update+0xd1/0x460 > >>> bpf_cgrp_storage_get+0x109/0x130 > >>> bpf_prog_72026450ec387477_cgrp_ptr+0x38/0x5e > >>> bpf_trace_run1+0x84/0x100 > >>> cgroup_storage_ptr+0x4c/0x60 > >>> bpf_selem_unlink_storage_nolock.constprop.0+0x135/0x160 > >>> bpf_selem_unlink_storage+0x6f/0x110 > >>> bpf_local_storage_map_free+0xa2/0x110 > >>> bpf_map_free_deferred+0x5b/0x90 > >>> process_one_work+0x17c/0x390 > >>> worker_thread+0x251/0x360 > >>> kthread+0xd2/0x100 > >>> ret_from_fork+0x34/0x50 > >>> ret_from_fork_asm+0x1a/0x30 > >>> </TASK> > >>> > >>> [ Since the verifier treats 'void *' as scalar which > >>> prevents me from getting a pointer to 'struct cgroup *', > >>> I added a raw tracepoint in cgroup_storage_ptr() to > >>> help reproducing this issue. ] > >>> > >>> Although it is tricky to reproduce, the risk of deadlock exists and > >>> worthy of a fix, by passing its busy counter to the free procedure so > >>> it can be properly incremented before storage/smap locking. > >> > >> The above stack trace and explanation does not show that we will have > >> a potential dead lock here. You mentioned that it is tricky to > >> reproduce, > >> does it mean that you have done some analysis or coding to reproduce it? > >> Could you share the details on why you think we may have deadlock here? > > > > The stack is A-A deadlocked: cgroup_storage_ptr() is called with > > storage->lock held, while the bpf_prog attaching on this function > > also tries to acquire the same lock by calling bpf_cgrp_storage_get() > > thus leading to a AA deadlock. > > > > The tricky part is, instead of attaching on cgroup_storage_ptr() > > directly, I added a tracepoint inside it to hook: > > > > ------ > > diff --git a/kernel/bpf/bpf_cgrp_storage.c > > b/kernel/bpf/bpf_cgrp_storage.c > > index 20f05de92e9c..679209d4f88f 100644 > > --- a/kernel/bpf/bpf_cgrp_storage.c > > +++ b/kernel/bpf/bpf_cgrp_storage.c > > @@ -40,6 +40,8 @@ static struct bpf_local_storage __rcu > > **cgroup_storage_ptr(void *owner) > > { > > struct cgroup *cg = owner; > > > > + trace_cgroup_ptr(cg); > > + > > return &cg->bpf_cgrp_storage; > > } > > > > ------ > > > > The reason doing so is that typecasting from 'void *owner' to > > 'struct cgroup *' will be rejected by the verifier. But there > > could be other ways to obtain a pointer to the @owner cgroup > > too, making the deadlock possible. > > I checked the callstack and what you described indeed the case. > In function bpf_selem_unlink_storage(), local_storage->lock is > held before calling bpf_selem_unlink_storage_nolock/cgroup_storage_ptr. > If there is a fentry/tracepoint on the cgroup_storage_ptr and then we could > have a deadlock as you described in the above. > > As you mentioned, it is tricky to reproduce. fentry on cgroup_storage_ptr > does not work due to func signature: > struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner) > Even say we support 'void *' for fentry and we do bpf_rdonly_cast() > to cast 'void *owner' to 'struct cgroup *owner', and owner cannot be > passed to helper/kfunc. > > Your fix looks good but it would be great to have a reproducer. > One possibility is to find a function which can be fentried within > local_storage->lock. If you know the cgroup id, in bpf program you > can use bpf_cgroup_from_id() to get a trusted cgroup ptr from the id. > and then you can use that cgroup ptr to do bpf_cgrp_storage_get() etc. > which should be able to triger deadlock. Could you give a try? I'd rather mark a set of functions as notrace to avoid this situation or add: CFLAGS_REMOVE_bpf_cgrp_storage.o = $(CC_FLAGS_FTRACE)