Paul Menzel <pmenzel@xxxxxxxxxxxxx> writes: > Am 08.02.22 um 11:09 schrieb Michael Ellerman: >> Paul Menzel writes: > > […] > >>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux >>> 5.17-rc2+ with rcutorture tests >> >> I'm not sure if that's the host kernel version or the version you're >> using of rcutorture? Can you tell us the sha1 of your host kernel and of >> the tree you're running rcutorture from? > > The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, > I am unable to find the exact sha1. > > $ more /proc/version > Linux version 5.17.0-rc1+ > (pmenzel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) (Ubuntu > clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 > 17:13:04 CET 2022 OK. In general rc1 kernels can have issues, so it might be worth rebooting the host into either v5.17-rc3 or a distro or stable kernel. Just to rule out any issues on the host. > The Linux tree, from where I run rcutorture from, is at commit > dfd42facf1e4 (Linux 5.17-rc3) with four patches on top: > > $ git log --oneline -6 > 207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems > with rcutorture on ppc64le: allmodconfig(2) and other failures > 8c82f96fbe57 ata: libata-sata: improve sata_link_debounce() > a447541d925f ata: libata-sata: remove debounce delay by default > afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing > f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface > dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3 > >>> $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10 >>> >>> the built init >>> >>> $ file tools/testing/selftests/rcutorture/initrd/init >>> tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped >> >> Mine looks pretty much identical: >> >> $ file tools/testing/selftests/rcutorture/initrd/init >> tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped >> >>> segfaults in QEMU. From one of the log files >> >> But mine doesn't segfault, it runs fine and the test completes. >> >> What qemu version are you using? >> >> I tried 4.2.1 and 6.2.0, both worked. > > $ qemu-system-ppc64le --version > QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1) > Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers OK, that's one difference between our setups, but I'd be surprised if it explains this bug, but I guess anything's possible. >>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log > > Sorry, that was the wrong path/test. The correct one for the excerpt > below is: > > > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log > > (For TREE03, QEMU does not start the Linux kernel at all, that means no > output after: > > Booting Linux via __start() @ 0x0000000000400000 ... OK yeah I see that too. Removing "threadirqs" from tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot seems to fix it. I still see some preempt related warnings, we clearly have some bugs with preempt enabled. > You can now download the content of > `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01` > [1, 65 MB]. > > Can you reproduce the segmentation fault with the line below? > > $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 > -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial > stdio -m 512 -kernel > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux > -append "debug_boot_weak_hash panic=-1 console=ttyS0 > torture.disable_onoff_at_boot locktorture.onoff_interval=3 > locktorture.onoff_holdoff=30 locktorture.stat_interval=15 > locktorture.shutdown_secs=60 locktorture.verbose=1" That works fine for me, boots and runs the test, then shuts down. I assume you see the segfault on every boot, not intermittently? So the differences between our setups are the host kernel and the qemu version. Can you try a different host kernel easily? The other thing would be to try a different qemu version, you might need to build from source, but it's not that hard :) cheers