On Thu, Jan 20, 2022 at 11:21:36PM +0100, Guillaume Morin wrote: > On 20 Jan 12:57, Paul E. McKenney wrote: > > > > On Thu, Jan 20, 2022 at 08:26:54PM +0100, Guillaume Morin wrote: > > > On 20 Jan 11:16, Paul E. McKenney wrote: > > > > On Thu, Jan 20, 2022 at 07:55:01PM +0100, Guillaume Morin wrote: > > > > > I believe commit 614ddad17f22a22e035e2ea37a04815f50362017 (slated for > > > > > 5.17) should be queued for all 5.4+ stable branches as it fixes a > > > > > serious lockup bug. FWIW I have verified it applies cleanly on all 4 > > > > > branches. > > > > > > > > > > Does that make sense to you? > > > > > > > > From a quick glance at v5.4, it looks quite plausible to me. > > > > > > > > I do suggest that you try building and testing, given that the hardware's > > > > idea of what is plausible overrides that of either of us. ;-) > > > > > > We've had a few dozens lockups on 5.4 and 5.10 due to this bug (what > > > lead me to write to you back in Sep). The original bugzilla report is on > > > 5.4 as well, see https://bugzilla.kernel.org/show_bug.cgi?id=208685. So > > > I am positive that the issue is reachable in both kernels. > > > > > > Also I do know for sure it fixes the problem for 5.10. I don't have a > > > test rig anymore for 5.4. But considering we know it's reachable with > > > 5.4, I think the patch should be applied for 5.4+. Obviously, you're the > > > expert here though. > > > > Au contraire! I do not claim much expertise on -stable validation. > > > > If it was me, I would run a quick touch-test like this from the top-level > > directory of the Linux-kernel source tree on a qemu/KVM-capable system: > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus N --duration 10 --configs "TREE01 TREE04" > > > > Where "N" is replaced by the number of CPUs on your system, which should > > preferably be at least eight. > > > > This will take somewhere between 15 minutes and an hour to run, depending > > on your system. > > > > Sadly, v5.4 isn't quite as good at analyzing results as are current > > versions, but please feel free to send me the output. > > > > Does that help? > > Ok I did a quick run with 614ddad17f22a22e035e2ea37a04815f50362017 > applied on top of the 5.4 stable branch. Not quite sure how I got > suckered into running a test on a kernel I don't even run, but hey I > guess everybody must do their part :-) That is indeed what I keep telling myself. ;-) > Not sure about CONFIG_HOTPLUG_CPU thing at the end. > > tools/testing/selftests/rcutorture/initrd/init already exists, no need to create it > Results directory: /usr/scratch/kernel/tools/testing/selftests/rcutorture/res/2022.01.20-17:02:37 > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 60 --duration 10 --configs TREE01 TREE04 > ----Start batch 1: Thu 20 Jan 2022 05:02:37 PM EST > TREE01 8: Starting build. Thu 20 Jan 2022 05:02:37 PM EST > TREE01 8: Waiting for build to complete. Thu 20 Jan 2022 05:02:37 PM EST > TREE01 8: Build complete. Thu 20 Jan 2022 05:03:16 PM EST > TREE04 8: Starting build. Thu 20 Jan 2022 05:03:16 PM EST > TREE04 8: Waiting for build to complete. Thu 20 Jan 2022 05:03:16 PM EST > TREE04 8: Build complete. Thu 20 Jan 2022 05:03:55 PM EST 39 seconds to build each kernel. Not bad! ;-) > ---- TREE01 8: Kernel present. Thu 20 Jan 2022 05:03:55 PM EST > ---- TREE04 8: Kernel present. Thu 20 Jan 2022 05:03:55 PM EST > ---- Starting kernels. Thu 20 Jan 2022 05:03:55 PM EST > ---- All kernel runs complete. Thu 20 Jan 2022 05:14:05 PM EST > ---- TREE01 8: Build/run results: > --- Thu 20 Jan 2022 05:02:37 PM EST: Starting build > --- Thu 20 Jan 2022 05:03:55 PM EST: Starting kernel > CPU-hotplug kernel, adding rcutorture onoff. > Monitoring qemu job at pid 46081 > Grace period for qemu job at pid 46081 > ---- TREE04 8: Build/run results: > --- Thu 20 Jan 2022 05:03:16 PM EST: Starting build > :CONFIG_HOTPLUG_CPU: improperly set > --- Thu 20 Jan 2022 05:03:55 PM EST: Starting kernel > CPU-hotplug kernel, adding rcutorture onoff. > Monitoring qemu job at pid 45847 > Grace period for qemu job at pid 45847 > > > --- Thu 20 Jan 2022 05:02:37 PM EST Test summary: > Results directory: /usr/scratch/kernel/tools/testing/selftests/rcutorture/res/2022.01.20-17:02:37 > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 60 --duration 10 --configs TREE01 TREE04 > TREE01 ------- 12719 GPs (21.1983/s) [rcu: g94609 f0x0 ] > TREE04 ------- 3128 GPs (5.21333/s) [rcu: g23621 f0x0 ] > :CONFIG_HOTPLUG_CPU: improperly set This run was successful, so good! But you are quite correct to be suspicious of the "improperly set" message. But is is OK in this particular case. This message appears because security-related changes made it quite difficult to disable CPU hotplug on x86. The rcutorture test suite is therefore complaining that even though it tried disabling CPU hotplug for the TREE04 test scenario, it found that the kernel nevertheless built with CONFIG_HOTPLUG_CPU=y. And later versions of rcutorture resigned themselves to always testing with CONFIG_HOTPLUG_CPU=y. So again, this run was successful. And thank you for checking it! Thanx, Paul