Patch "cpu/hotplug: Don't offline the last non-isolated CPU" has been added to the 6.6-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    cpu/hotplug: Don't offline the last non-isolated CPU

to the 6.6-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     cpu-hotplug-don-t-offline-the-last-non-isolated-cpu.patch
and it can be found in the queue-6.6 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit a8be1012743b6db4aad1518222a2c92b3f5826f5
Author: Ran Xiaokai <ran.xiaokai@xxxxxxxxxx>
Date:   Tue Oct 17 17:09:53 2023 +0800

    cpu/hotplug: Don't offline the last non-isolated CPU
    
    [ Upstream commit 38685e2a0476127db766f81b1c06019ddc4c9ffa ]
    
    If a system has isolated CPUs via the "isolcpus=" command line parameter,
    then an attempt to offline the last housekeeping CPU will result in a
    WARN_ON() when rebuilding the scheduler domains and a subsequent panic due
    to and unhandled empty CPU mas in partition_sched_domains_locked().
    
    cpuset_hotplug_workfn()
      rebuild_sched_domains_locked()
        ndoms = generate_sched_domains(&doms, &attr);
          cpumask_and(doms[0], top_cpuset.effective_cpus, housekeeping_cpumask(HK_FLAG_DOMAIN));
    
    Thus results in an empty CPU mask which triggers the warning and then the
    subsequent crash:
    
    WARNING: CPU: 4 PID: 80 at kernel/sched/topology.c:2366 build_sched_domains+0x120c/0x1408
    Call trace:
     build_sched_domains+0x120c/0x1408
     partition_sched_domains_locked+0x234/0x880
     rebuild_sched_domains_locked+0x37c/0x798
     rebuild_sched_domains+0x30/0x58
     cpuset_hotplug_workfn+0x2a8/0x930
    
    Unable to handle kernel paging request at virtual address fffe80027ab37080
     partition_sched_domains_locked+0x318/0x880
     rebuild_sched_domains_locked+0x37c/0x798
    
    Aside of the resulting crash, it does not make any sense to offline the last
    last housekeeping CPU.
    
    Prevent this by masking out the non-housekeeping CPUs when selecting a
    target CPU for initiating the CPU unplug operation via the work queue.
    
    Suggested-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
    Signed-off-by: Ran Xiaokai <ran.xiaokai@xxxxxxxxxx>
    Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
    Link: https://lore.kernel.org/r/202310171709530660462@xxxxxxxxxx
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 1a189da3bdac5..303cb0591b4b1 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1523,11 +1523,14 @@ static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
 	/*
 	 * Ensure that the control task does not run on the to be offlined
 	 * CPU to prevent a deadlock against cfs_b->period_timer.
+	 * Also keep at least one housekeeping cpu onlined to avoid generating
+	 * an empty sched_domain span.
 	 */
-	cpu = cpumask_any_but(cpu_online_mask, cpu);
-	if (cpu >= nr_cpu_ids)
-		return -EBUSY;
-	return work_on_cpu(cpu, __cpu_down_maps_locked, &work);
+	for_each_cpu_and(cpu, cpu_online_mask, housekeeping_cpumask(HK_TYPE_DOMAIN)) {
+		if (cpu != work.cpu)
+			return work_on_cpu(cpu, __cpu_down_maps_locked, &work);
+	}
+	return -EBUSY;
 }
 
 static int cpu_down(unsigned int cpu, enum cpuhp_state target)



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux