Re: PROBLEM: Only one CPU active on Ultra 60 since ~4.8 (regression)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nick,

On Thu, Mar 28, 2024 at 05:08:50PM -0400, Nick Bowler wrote:
> On 2024-03-28 16:09, Linus Torvalds wrote:
> > On Thu, 28 Mar 2024 at 12:36, Linux regression tracking (Thorsten
> > Leemhuis) <regressions@xxxxxxxxxxxxx> wrote:
> >>
> >> [CCing Linus, in case I say something to his disliking]
> >>
> >> On 22.03.24 05:57, Nick Bowler wrote:
> >>>
> >>> Just a friendly reminder that this issue still happens on Linux 6.8 and
> >>> reverting commit 9b2f753ec237 as indicated below is still sufficient to
> >>> resolve the problem.
> >>
> >> FWIW, that commit 9b2f753ec23710 ("sparc64: Fix cpu_possible_mask if
> >> nr_cpus is set") is from v4.8. Reverting it after all that time might
> >> easily lead to even bigger trouble.
> > 
> > I'm definitely not reverting a patch from almost a decade ago as a regression.
> > 
> > If it took that long to find, it can't be that critical of a regression.
> 
> FWIW I'm not the first person to notice this problem.  Searching the sparclinux
> archive for "ultra 60" which turns up this very similar report[1] from two years
> prior to mine which also went nowhere (sadly, this reporter did not perform a
> bisection to find the problematic commit -- perhaps because nobody asked).
> 
> [1] https://lore.kernel.org/sparclinux/20201009161924.c8f031c079dd852941307870@xxxxxx/

I took a look at this and may have a fix. Could you try the following
patch. It builds - but I have not tested it.

	Sam


>From a0fb7c6e6817849550d07b4c5a354ccc58382bc1 Mon Sep 17 00:00:00 2001
From: Sam Ravnborg <sam@xxxxxxxxxxxx>
Date: Fri, 29 Mar 2024 10:34:07 +0100
Subject: [PATCH] sparc64: Fix number of online CPUs

Nick Bowler reported:
    When using newer kernels on my Ultra 60 with dual 450MHz UltraSPARC-II
    CPUs, I noticed that only CPU 0 comes up, while older kernels (including
    4.7) are working fine with both CPUs.

      I bisected the failure to this commit:

      9b2f753ec23710aa32c0d837d2499db92fe9115b is the first bad commit
      commit 9b2f753ec23710aa32c0d837d2499db92fe9115b
      Author: Atish Patra <atish.patra@xxxxxxxxxx>
      Date:   Thu Sep 15 14:54:40 2016 -0600

      sparc64: Fix cpu_possible_mask if nr_cpus is set

    This is a small change that reverts very easily on top of 5.18: there is
    just one trivial conflict.  Once reverted, both CPUs work again.

    Maybe this is related to the fact that the CPUs on this system are
    numbered CPU0 and CPU2 (there is no CPU1)?

The current code that adjust cpu_possible based on nr_cpu_ids do not
take into account that CPU's may not come one after each other.
Move the check to the function that setup the cpu_possible mask
so there is no need to adjust it later.

Signed-off-by: Sam Ravnborg <sam@xxxxxxxxxxxx>
Reported-by: Nick Bowler <nbowler@xxxxxxxxxx>
Cc: Andreas Larsson <andreas@xxxxxxxxxxx>
Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>
---
 arch/sparc/include/asm/smp_64.h |  2 --
 arch/sparc/kernel/prom_64.c     |  4 +++-
 arch/sparc/kernel/setup_64.c    |  1 -
 arch/sparc/kernel/smp_64.c      | 14 --------------
 4 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/arch/sparc/include/asm/smp_64.h b/arch/sparc/include/asm/smp_64.h
index 505b6700805d..0964fede0b2c 100644
--- a/arch/sparc/include/asm/smp_64.h
+++ b/arch/sparc/include/asm/smp_64.h
@@ -47,7 +47,6 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask);
 int hard_smp_processor_id(void);
 #define raw_smp_processor_id() (current_thread_info()->cpu)
 
-void smp_fill_in_cpu_possible_map(void);
 void smp_fill_in_sib_core_maps(void);
 void __noreturn cpu_play_dead(void);
 
@@ -77,7 +76,6 @@ void __cpu_die(unsigned int cpu);
 #define smp_fill_in_sib_core_maps() do { } while (0)
 #define smp_fetch_global_regs() do { } while (0)
 #define smp_fetch_global_pmu() do { } while (0)
-#define smp_fill_in_cpu_possible_map() do { } while (0)
 #define smp_init_cpu_poke() do { } while (0)
 #define scheduler_poke() do { } while (0)
 
diff --git a/arch/sparc/kernel/prom_64.c b/arch/sparc/kernel/prom_64.c
index 998aa693d491..ba82884cb92a 100644
--- a/arch/sparc/kernel/prom_64.c
+++ b/arch/sparc/kernel/prom_64.c
@@ -483,7 +483,9 @@ static void *record_one_cpu(struct device_node *dp, int cpuid, int arg)
 	ncpus_probed++;
 #ifdef CONFIG_SMP
 	set_cpu_present(cpuid, true);
-	set_cpu_possible(cpuid, true);
+
+	if (num_possible_cpus() < nr_cpu_ids)
+		set_cpu_possible(cpuid, true);
 #endif
 	return NULL;
 }
diff --git a/arch/sparc/kernel/setup_64.c b/arch/sparc/kernel/setup_64.c
index 6a4797dec34b..6bbe8e394ad3 100644
--- a/arch/sparc/kernel/setup_64.c
+++ b/arch/sparc/kernel/setup_64.c
@@ -671,7 +671,6 @@ void __init setup_arch(char **cmdline_p)
 
 	paging_init();
 	init_sparc64_elf_hwcap();
-	smp_fill_in_cpu_possible_map();
 	/*
 	 * Once the OF device tree and MDESC have been setup and nr_cpus has
 	 * been parsed, we know the list of possible cpus.  Therefore we can
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index f3969a3600db..e50c38eba2b8 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1220,20 +1220,6 @@ void __init smp_setup_processor_id(void)
 		xcall_deliver_impl = hypervisor_xcall_deliver;
 }
 
-void __init smp_fill_in_cpu_possible_map(void)
-{
-	int possible_cpus = num_possible_cpus();
-	int i;
-
-	if (possible_cpus > nr_cpu_ids)
-		possible_cpus = nr_cpu_ids;
-
-	for (i = 0; i < possible_cpus; i++)
-		set_cpu_possible(i, true);
-	for (; i < NR_CPUS; i++)
-		set_cpu_possible(i, false);
-}
-
 void smp_fill_in_sib_core_maps(void)
 {
 	unsigned int i;
-- 
2.34.1





[Index of Archives]     [Kernel Development]     [DCCP]     [Linux ARM Development]     [Linux]     [Photo]     [Yosemite Help]     [Linux ARM Kernel]     [Linux SCSI]     [Linux x86_64]     [Linux Hams]

  Powered by Linux