Re: [PATCH v2] parisc: fix a crash with multicore scheduler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Mikulas,

On 6/1/22 19:18, Mikulas Patocka wrote:
> With the kernel 5.18, the system will hang on boot if it is compiled with
> CONFIG_SCHED_MC. The last printed message is "Brought up 1 node, 1 CPU".
>
> The crash happens in sd_init
> tl->mask (which is cpu_coregroup_mask) returns an empty mask. This happens
> 	because cpu_topology[0].core_sibling is empty.
> Consequently, sd_span is set to an empty mask
> sd_id = cpumask_first(sd_span) sets sd_id == NR_CPUS (because the mask is
> 	empty)
> sd->shared = *per_cpu_ptr(sdd->sds, sd_id); sets sd->shared to NULL
> 	because sd_id is out of range
> atomic_inc(&sd->shared->ref); crashes without printing anything
>
> We can fix it by calling reset_cpu_topology() from init_cpu_topology() -
> this will initialize the sibling masks on CPUs, so that they're not empty.
>
> This patch also removes the variable "dualcores_found", it is useless,
> because during boot, init_cpu_topology is called before
> store_cpu_topology. Thus, set_sched_topology(parisc_mc_topology) is never
> called. We don't need to call it at all because default_topology in
> kernel/sched/topology.c contains the same items as parisc_mc_topology.
>
> Note that we should not call store_cpu_topology() from init_per_cpu()
> because it is called too early in the kernel initialization process and it
> results in the message "Failure to register CPU0 device". Before this
> patch, store_cpu_topology() would exit immediatelly because
> cpuid_topo->core id was uninitialized and it was 0.
>
> Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx	# v5.18

Thanks a lot !!!

It took me some time to test it, but it looks good and boots on
all of my machines so far. I was curious if 32-bit kernels still
work since that was one of the issues with the older patches...

With your patch we can drop the "config SCHED_MC" entry from
arch/parisc/Kconfig as well.
Will you respin, or should I simply add this to your patch?

Helge


>
> ---
>  arch/parisc/kernel/processor.c |    2 --
>  arch/parisc/kernel/topology.c  |   16 +---------------
>  2 files changed, 1 insertion(+), 17 deletions(-)
>
> Index: linux-2.6/arch/parisc/kernel/topology.c
> ===================================================================
> --- linux-2.6.orig/arch/parisc/kernel/topology.c	2022-06-01 15:32:59.000000000 +0200
> +++ linux-2.6/arch/parisc/kernel/topology.c	2022-06-01 18:37:37.000000000 +0200
> @@ -20,8 +20,6 @@
>
>  static DEFINE_PER_CPU(struct cpu, cpu_devices);
>
> -static int dualcores_found;
> -
>  /*
>   * store_cpu_topology is called at boot when only one cpu is running
>   * and with the mutex cpu_hotplug.lock locked, when several cpus have booted,
> @@ -60,7 +58,6 @@ void store_cpu_topology(unsigned int cpu
>  			if (p->cpu_loc) {
>  				cpuid_topo->core_id++;
>  				cpuid_topo->package_id = cpu_topology[cpu].package_id;
> -				dualcores_found = 1;
>  				continue;
>  			}
>  		}
> @@ -80,22 +77,11 @@ void store_cpu_topology(unsigned int cpu
>  		cpu_topology[cpuid].package_id);
>  }
>
> -static struct sched_domain_topology_level parisc_mc_topology[] = {
> -#ifdef CONFIG_SCHED_MC
> -	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
> -#endif
> -
> -	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> -	{ NULL, },
> -};
> -
>  /*
>   * init_cpu_topology is called at boot when only one cpu is running
>   * which prevent simultaneous write access to cpu_topology array
>   */
>  void __init init_cpu_topology(void)
>  {
> -	/* Set scheduler topology descriptor */
> -	if (dualcores_found)
> -		set_sched_topology(parisc_mc_topology);
> +	reset_cpu_topology();
>  }
> Index: linux-2.6/arch/parisc/kernel/processor.c
> ===================================================================
> --- linux-2.6.orig/arch/parisc/kernel/processor.c	2022-06-01 15:32:59.000000000 +0200
> +++ linux-2.6/arch/parisc/kernel/processor.c	2022-06-01 18:35:12.000000000 +0200
> @@ -327,8 +327,6 @@ int init_per_cpu(int cpunum)
>  	set_firmware_width();
>  	ret = pdc_coproc_cfg(&coproc_cfg);
>
> -	store_cpu_topology(cpunum);
> -
>  	if(ret >= 0 && coproc_cfg.ccr_functional) {
>  		mtctl(coproc_cfg.ccr_functional, 10);  /* 10 == Coprocessor Control Reg */
>
>





[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux