On 6/13/22 2:22 AM, Toke Høiland-Jørgensen wrote:
!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!
Satya Durga Srinivasu Prabhala <quic_satyap@xxxxxxxxxxx> writes:
Below recursion is observed in a rare scenario where __schedule()
takes rq lock, at around same time task's affinity is being changed,
bpf function for tracing sched_switch calls migrate_enabled(),
checks for affinity change (cpus_ptr != cpus_mask) lands into
__set_cpus_allowed_ptr which tries acquire rq lock and causing the
recursion bug.
So this only affects tracing programs that attach to tasks that can have
their affinity changed? Or do we need to review migrate_enable() vs
preempt_enable() for networking hooks as well?
I think that changing from migrate_disable() to preempt_disable()
won't work from RT kernel. In fact, the original preempt_disable() to
migrate_disable() is triggered by RT kernel discussion.
As you mentioned, this is a very special case. Not sure whether we have
a good way to fix it or not. Is it possible we can check whether rq lock
is held or not under condition cpus_ptr != cpus_mask? If it is,
migrate_disable() (or a variant of it) should return an error code
to indicate it won't work?
-Toke