On Sat, Feb 12, 2022 at 10:21:11PM +0000, Alexander Lobakin wrote: > After enabling CONFIG_SCHED_CORE (landed during 5.14 cycle), > 2-core 2-thread-per-core interAptiv (CPS-driven) started emitting > the following: > > [ 0.025698] CPU1 revision is: 0001a120 (MIPS interAptiv (multi)) > [ 0.048183] ------------[ cut here ]------------ > [ 0.048187] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:6025 sched_core_cpu_starting+0x198/0x240 > [ 0.048220] Modules linked in: > [ 0.048233] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc3+ #35 b7b319f24073fd9a3c2aa7ad15fb7993eec0b26f > [ 0.048247] Stack : 817f0000 00000004 327804c8 810eb050 00000000 00000004 00000000 c314fdd1 > [ 0.048278] 830cbd64 819c0000 81800000 817f0000 83070bf4 00000001 830cbd08 00000000 > [ 0.048307] 00000000 00000000 815fcbc4 00000000 00000000 00000000 00000000 00000000 > [ 0.048334] 00000000 00000000 00000000 00000000 817f0000 00000000 00000000 817f6f34 > [ 0.048361] 817f0000 818a3c00 817f0000 00000004 00000000 00000000 4dc33260 0018c933 > [ 0.048389] ... > [ 0.048396] Call Trace: > [ 0.048399] [<8105a7bc>] show_stack+0x3c/0x140 > [ 0.048424] [<8131c2a0>] dump_stack_lvl+0x60/0x80 > [ 0.048440] [<8108b5c0>] __warn+0xc0/0xf4 > [ 0.048454] [<8108b658>] warn_slowpath_fmt+0x64/0x10c > [ 0.048467] [<810bd418>] sched_core_cpu_starting+0x198/0x240 > [ 0.048483] [<810c6514>] sched_cpu_starting+0x14/0x80 > [ 0.048497] [<8108c0f8>] cpuhp_invoke_callback_range+0x78/0x140 > [ 0.048510] [<8108d914>] notify_cpu_starting+0x94/0x140 > [ 0.048523] [<8106593c>] start_secondary+0xbc/0x280 > [ 0.048539] > [ 0.048543] ---[ end trace 0000000000000000 ]--- > [ 0.048636] Synchronize counters for CPU 1: done. > > ...for each but CPU 0/boot. > Basic debug printks right before the mentioned line say: > > [ 0.048170] CPU: 1, smt_mask: > > So smt_mask, which is sibling mask obviously, is empty when entering > the function. > This is critical, as sched_core_cpu_starting() calculates > core-scheduling parameters only once per CPU start, and it's crucial > to have all the parameters filled in at that moment (at least it > uses cpu_smt_mask() which in fact is `&cpu_sibling_map[cpu]` on > MIPS). > > A bit of debugging led me to that set_cpu_sibling_map() performing > the actual map calculation, was being invocated after > notify_cpu_start(), and exactly the latter function starts CPU HP > callback round (sched_core_cpu_starting() is basically a CPU HP > callback). > While the flow is same on ARM64 (maps after the notifier, although > before calling set_cpu_online()), x86 started calculating sibling > maps earlier than starting the CPU HP callbacks in Linux 4.14 (see > [0] for the reference). Neither me nor my brief tests couldn't find > any potential caveats in calculating the maps right after performing > delay calibration, but the WARN splat is now gone. > The very same debug prints now yield exactly what I expected from > them: > > [ 0.048433] CPU: 1, smt_mask: 0-1 > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git/commit/?id=76ce7cfe35ef > > Signed-off-by: Alexander Lobakin <alobakin@xxxxx> > --- > arch/mips/kernel/smp.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/mips/kernel/smp.c b/arch/mips/kernel/smp.c > index d542fb7af3ba..1986d1309410 100644 > --- a/arch/mips/kernel/smp.c > +++ b/arch/mips/kernel/smp.c > @@ -351,6 +351,9 @@ asmlinkage void start_secondary(void) > cpu = smp_processor_id(); > cpu_data[cpu].udelay_val = loops_per_jiffy; > > + set_cpu_sibling_map(cpu); > + set_cpu_core_map(cpu); > + > cpumask_set_cpu(cpu, &cpu_coherent_mask); > notify_cpu_starting(cpu); > > @@ -362,9 +365,6 @@ asmlinkage void start_secondary(void) > /* The CPU is running and counters synchronised, now mark it online */ > set_cpu_online(cpu, true); > > - set_cpu_sibling_map(cpu); > - set_cpu_core_map(cpu); > - > calculate_cpu_foreign_map(); > > /* > -- > 2.35.1 applied to mips-fixes. Thomas. -- Crap can work. Given enough thrust pigs will fly, but it's not necessarily a good idea. [ RFC1925, 2.3 ]