On 20 Jan 12:57, Paul E. McKenney wrote: > > On Thu, Jan 20, 2022 at 08:26:54PM +0100, Guillaume Morin wrote: > > On 20 Jan 11:16, Paul E. McKenney wrote: > > > On Thu, Jan 20, 2022 at 07:55:01PM +0100, Guillaume Morin wrote: > > > > I believe commit 614ddad17f22a22e035e2ea37a04815f50362017 (slated for > > > > 5.17) should be queued for all 5.4+ stable branches as it fixes a > > > > serious lockup bug. FWIW I have verified it applies cleanly on all 4 > > > > branches. > > > > > > > > Does that make sense to you? > > > > > > From a quick glance at v5.4, it looks quite plausible to me. > > > > > > I do suggest that you try building and testing, given that the hardware's > > > idea of what is plausible overrides that of either of us. ;-) > > > > We've had a few dozens lockups on 5.4 and 5.10 due to this bug (what > > lead me to write to you back in Sep). The original bugzilla report is on > > 5.4 as well, see https://bugzilla.kernel.org/show_bug.cgi?id=208685. So > > I am positive that the issue is reachable in both kernels. > > > > Also I do know for sure it fixes the problem for 5.10. I don't have a > > test rig anymore for 5.4. But considering we know it's reachable with > > 5.4, I think the patch should be applied for 5.4+. Obviously, you're the > > expert here though. > > Au contraire! I do not claim much expertise on -stable validation. > > If it was me, I would run a quick touch-test like this from the top-level > directory of the Linux-kernel source tree on a qemu/KVM-capable system: > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus N --duration 10 --configs "TREE01 TREE04" > > Where "N" is replaced by the number of CPUs on your system, which should > preferably be at least eight. > > This will take somewhere between 15 minutes and an hour to run, depending > on your system. > > Sadly, v5.4 isn't quite as good at analyzing results as are current > versions, but please feel free to send me the output. > > Does that help? Ok I did a quick run with 614ddad17f22a22e035e2ea37a04815f50362017 applied on top of the 5.4 stable branch. Not quite sure how I got suckered into running a test on a kernel I don't even run, but hey I guess everybody must do their part :-) Not sure about CONFIG_HOTPLUG_CPU thing at the end. tools/testing/selftests/rcutorture/initrd/init already exists, no need to create it Results directory: /usr/scratch/kernel/tools/testing/selftests/rcutorture/res/2022.01.20-17:02:37 tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 60 --duration 10 --configs TREE01 TREE04 ----Start batch 1: Thu 20 Jan 2022 05:02:37 PM EST TREE01 8: Starting build. Thu 20 Jan 2022 05:02:37 PM EST TREE01 8: Waiting for build to complete. Thu 20 Jan 2022 05:02:37 PM EST TREE01 8: Build complete. Thu 20 Jan 2022 05:03:16 PM EST TREE04 8: Starting build. Thu 20 Jan 2022 05:03:16 PM EST TREE04 8: Waiting for build to complete. Thu 20 Jan 2022 05:03:16 PM EST TREE04 8: Build complete. Thu 20 Jan 2022 05:03:55 PM EST ---- TREE01 8: Kernel present. Thu 20 Jan 2022 05:03:55 PM EST ---- TREE04 8: Kernel present. Thu 20 Jan 2022 05:03:55 PM EST ---- Starting kernels. Thu 20 Jan 2022 05:03:55 PM EST ---- All kernel runs complete. Thu 20 Jan 2022 05:14:05 PM EST ---- TREE01 8: Build/run results: --- Thu 20 Jan 2022 05:02:37 PM EST: Starting build --- Thu 20 Jan 2022 05:03:55 PM EST: Starting kernel CPU-hotplug kernel, adding rcutorture onoff. Monitoring qemu job at pid 46081 Grace period for qemu job at pid 46081 ---- TREE04 8: Build/run results: --- Thu 20 Jan 2022 05:03:16 PM EST: Starting build :CONFIG_HOTPLUG_CPU: improperly set --- Thu 20 Jan 2022 05:03:55 PM EST: Starting kernel CPU-hotplug kernel, adding rcutorture onoff. Monitoring qemu job at pid 45847 Grace period for qemu job at pid 45847 --- Thu 20 Jan 2022 05:02:37 PM EST Test summary: Results directory: /usr/scratch/kernel/tools/testing/selftests/rcutorture/res/2022.01.20-17:02:37 tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 60 --duration 10 --configs TREE01 TREE04 TREE01 ------- 12719 GPs (21.1983/s) [rcu: g94609 f0x0 ] TREE04 ------- 3128 GPs (5.21333/s) [rcu: g23621 f0x0 ] :CONFIG_HOTPLUG_CPU: improperly set -- Guillaume Morin <guillaume@xxxxxxxxxxx>