On 5 Sep 2022 10:47:36 +0800 Jing-Ting Wu <jing-ting.wu@xxxxxxxxxxxx> wrote > > We meet the HANG_DETECT happened in T SW version with kernel-5.15. > Many tasks have been blocked for a long time. > > Root cause: > migration_cpu_stop() is not complete due to is_migration_disabled(p) is > true, complete is false and complete_all() never get executed. > It let other task wait the rwsem. See if handing task over to stopper again in case of migration disabled could survive your tests. Hillf --- linux-5.15/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2322,9 +2322,7 @@ static int migration_cpu_stop(void *data * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because * we're holding p->pi_lock. */ - if (task_rq(p) == rq) { - if (is_migration_disabled(p)) - goto out; + if (task_rq(p) == rq && !is_migration_disabled(p)) { if (pending) { p->migration_pending = NULL;