On Mon, Jun 13, 2022 at 10:47 AM Satya Durga Srinivasu Prabhala <quic_satyap@xxxxxxxxxxx> wrote: > > > On 6/13/22 9:35 AM, Yonghong Song wrote: > > > > > > On 6/13/22 2:22 AM, Toke Høiland-Jørgensen wrote: > >> !-------------------------------------------------------------------| > >> This Message Is From an External Sender > >> This message came from outside your organization. > >> |-------------------------------------------------------------------! > >> > >> Satya Durga Srinivasu Prabhala <quic_satyap@xxxxxxxxxxx> writes: > >> > >>> Below recursion is observed in a rare scenario where __schedule() > >>> takes rq lock, at around same time task's affinity is being changed, > >>> bpf function for tracing sched_switch calls migrate_enabled(), > >>> checks for affinity change (cpus_ptr != cpus_mask) lands into > >>> __set_cpus_allowed_ptr which tries acquire rq lock and causing the > >>> recursion bug. > >> > >> So this only affects tracing programs that attach to tasks that can have > >> their affinity changed? Or do we need to review migrate_enable() vs > >> preempt_enable() for networking hooks as well? > > > > I think that changing from migrate_disable() to preempt_disable() > > won't work from RT kernel. In fact, the original preempt_disable() to > > migrate_disable() is triggered by RT kernel discussion. > > > > As you mentioned, this is a very special case. Not sure whether we have > > a good way to fix it or not. Is it possible we can check whether rq lock > > is held or not under condition cpus_ptr != cpus_mask? If it is, > > migrate_disable() (or a variant of it) should return an error code > > to indicate it won't work? > > That essentially becomes using preempt_enable/disable(). > If we need migrate_enable/disable() for RT kernels, we can > add specific check for RT Kernels like below which should fix > issue for non-RT Kernels? > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 2b914a56a2c5..ec1a287dbf5e 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -1414,7 +1414,11 @@ bpf_prog_run_array(const struct bpf_prog_array > *array, > if (unlikely(!array)) > return ret; > > +#ifdef CONFIG_PREEMPT_RT > migrate_disable(); > +#else > + preempt_disable(); > +#endif > old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); > item = &array->items[0]; > while ((prog = READ_ONCE(item->prog))) { > @@ -1423,7 +1427,11 @@ bpf_prog_run_array(const struct bpf_prog_array > *array, > item++; > } > bpf_reset_run_ctx(old_run_ctx); > +#ifdef CONFIG_PREEMPT_RT > migrate_enable(); > +#else > + preempt_enable(); > +#endif This doesn't solve anything. Please provide a reproducer. iirc the task's affinity change can race even with preemption disabled on this cpu. Why would s/migrate/preemption/ address the deadlock ? Once there is a reproducer we need to figure out a different way of addressing it. Maybe we will special case trace_sched_switch.