Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 9/6/22 16:01, Waiman Long wrote:
On 9/6/22 14:30, Tejun Heo wrote:
Hello,

(cc'ing Waiman in case he has a better idea)

On Mon, Sep 05, 2022 at 04:22:29PM +0800, Jing-Ting Wu wrote:
https://lore.kernel.org/lkml/YvrWaml3F+x9Dk+T@xxxxxxxxxxxxxxx/ is for
fix cgroup_threadgroup_rwsem <-> cpus_read_lock() deadlock.
But this issue is cgroup_threadgroup_rwsem <-> cpuset_rwsem deadlock.
If I'm understanding what you're writing correctly, this isn't a deadlock.
The cpuset_hotplug_workfn simply isn't being woken up while holding
cpuset_rwsem and others are just waiting for that lock to be released.

I believe it is probably a bug in the scheduler core code. __set_cpus_allowed_ptr_locked() calls affine_move_task() to move to a random cpu within the new set allowable CPUs. However, if migration is disabled, it shouldn't call affine_move_task() at all. Instead, I would suggest that if the current cpu is within the new allowable cpus, it should just skip doing affine_move_task(). Otherwise, it should fail __set_cpus_allowed_ptr_locked().

Maybe like

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 838623b68031..5d9ea1553ec0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
                if (cpumask_equal(&p->cpus_mask, new_mask))
                        goto out;

-               if (WARN_ON_ONCE(p == current &&
-                                is_migration_disabled(p) &&
-                                !cpumask_test_cpu(task_cpu(p), new_mask))) {
+               if (is_migration_disabled(p) &&
+                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
+                       WARN_ON_ONCE(p == current);
                        ret = -EBUSY;
                        goto out;
                }
@@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
        if (flags & SCA_USER)
                user_mask = clear_user_cpus_ptr(p);

-       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
+       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
+               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
+       } else {
+               task_rq_unlock(rq, p, rf);
+       }

        kfree(user_mask);

I haven't tested it myself, though.

Cheers,
Longman




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux