Dear Paul I try to reproduce the bug in ppc64 VM in Oregon State University using the vmlinux extracted from https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz the ppc64 VM in which I run the qemu without hardware acceleration is: Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb 3 18:43:11 UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166) The qemu command I use to test: cd /tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$ $qemu-system-ppc64 -nographic -smp cores=2,threads=1 -net none -M pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log -m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12 rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3 rcutree.kthread_prio=2 threadirqs tree.use_softirq=0 rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15 rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1 rcutorture.verbose=1" The console.log is uploaded to: http://154.223.142.244/logs/20220310/console.paul.log The log tells us it is illegal instruction that causes the trouble: [ 4.246387][ T1] init[1]: illegal instruction (4) at 1002c308 nip 1002c308 lr 10001684 code 1 in init[10000000+d0000] [ 4.251400][ T1] init[1]: code: f90d88c0 f92a0008 f9480008 7c2004ac 2c2d0000 f9490000 386d88d0 380000e8 [ 4.253416][ T1] init[1]: code: 41820098 e92d8f98 75290010 4182008c <44000001> 2c2d0000 60000000 8902f438 Meanwhile, the vmlinux compiled by myself runs smoothly. Then I modify mkinitrd.sh to let it panic manually: http://154.223.142.244/logs/20220310/mkinitrd.sh The log tells us it is a segfault (instead of a illegal instruction): http://154.223.142.244/logs/20220310/console.zhouyi.log Then I use gdb to debug the init in host: ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb tools/testing/selftests/rcutorture/initrd/init (gdb) run Starting program: /home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init Program received signal SIGSEGV, Segmentation fault. 0x0000000010000b2c in ?? () (gdb) x/10i $pc => 0x10000b2c: stw r9,0(r9) 0x10000b30: trap 0x10000b34: .long 0x0 0x10000b38: .long 0x0 0x10000b3c: .long 0x0 0x10000b40: lis r2,4110 0x10000b44: addi r2,r2,31488 0x10000b48: mr r9,r1 0x10000b4c: rldicr r1,r1,0,59 0x10000b50: li r0,0 (gdb) p $r9 $1 = 0 (gdb) x/30x $pc - 0x30 0x10000afc: 0x38840040 0x387f0040 0xf8010040 0x48026919 0x10000b0c: 0x60000000 0xe8010040 0x7c0803a6 0x4bffff24 0x10000b1c: 0x00000000 0x01000000 0x00000180 0x39200000 0x10000b2c: 0x91290000 0x7fe00008 0x00000000 0x00000000 which matches the hex content of http://154.223.142.244/logs/20220310/console.zhouyi.log: [ 5.077431][ T1] init[1]: segfault (11) at 0 nip 10000b2c lr 10001024 code 1 in init[10000000+d0000] [ 5.087167][ T1] init[1]: code: 38840040 387f0040 f8010040 48026919 60000000 e8010040 7c0803a6 4bffff24 [ 5.093987][ T1] init[1]: code: 00000000 01000000 00000180 39200000 <91290000> 7fe00008 00000000 00000000 Conclusions: there might be something wrong when packing the init into vmlinux in your environment. I will continue to do research on this interesting problem with you. Thanks Kind Regards Zhouyi On Tue, Feb 8, 2022 at 8:12 PM Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote: > > Dear Michael, > > > Thank you for looking into this. > > Am 08.02.22 um 11:09 schrieb Michael Ellerman: > > Paul Menzel writes: > > […] > > >> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux > >> 5.17-rc2+ with rcutorture tests > > > > I'm not sure if that's the host kernel version or the version you're > > using of rcutorture? Can you tell us the sha1 of your host kernel and of > > the tree you're running rcutorture from? > > The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, > I am unable to find the exact sha1. > > $ more /proc/version > Linux version 5.17.0-rc1+ > (pmenzel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) (Ubuntu > clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 > 17:13:04 CET 2022 > > The Linux tree, from where I run rcutorture from, is at commit > dfd42facf1e4 (Linux 5.17-rc3) with four patches on top: > > $ git log --oneline -6 > 207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems > with rcutorture on ppc64le: allmodconfig(2) and other failures > 8c82f96fbe57 ata: libata-sata: improve sata_link_debounce() > a447541d925f ata: libata-sata: remove debounce delay by default > afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing > f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface > dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3 > > >> $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10 > >> > >> the built init > >> > >> $ file tools/testing/selftests/rcutorture/initrd/init > >> tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped > > > > Mine looks pretty much identical: > > > > $ file tools/testing/selftests/rcutorture/initrd/init > > tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped > > > >> segfaults in QEMU. From one of the log files > > > > But mine doesn't segfault, it runs fine and the test completes. > > > > What qemu version are you using? > > > > I tried 4.2.1 and 6.2.0, both worked. > > $ qemu-system-ppc64le --version > QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1) > Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers > > >> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log > > Sorry, that was the wrong path/test. The correct one for the excerpt > below is: > > > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log > > (For TREE03, QEMU does not start the Linux kernel at all, that means no > output after: > > Booting Linux via __start() @ 0x0000000000400000 ... > ) > > >> [ 1.119803][ T1] Run /init as init process > >> [ 1.122011][ T1] init[1]: segfault (11) at f0656d90 nip 10000a18 lr 0 code 1 in init[10000000+d0000] > >> [ 1.124863][ T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4bffff58 00000000 01000000 00000580 3c40100f > >> [ 1.128823][ T1] init[1]: code: 38427c00 7c290b78 782106e4 38000000 <f821ff81> 7c0803a6 f8010000 e9028010 > > > > The disassembly from 3c40100f is: > > lis r2,4111 > > addi r2,r2,31744 > > mr r9,r1 > > rldicr r1,r1,0,59 > > li r0,0 > > stdu r1,-128(r1) <- fault > > mtlr r0 > > std r0,0(r1) > > ld r8,-32752(r2) > > > > > > I think you'll find that's the code at the ELF entry point. You can > > check with: > > > > $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry > > Entry point address: 0x10000c0c > > > > $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 10000c0c > > 10000c0c: 0e 10 40 3c lis r2,4110 > > 10000c10: 00 7b 42 38 addi r2,r2,31488 > > 10000c14: 78 0b 29 7c mr r9,r1 > > 10000c18: e4 06 21 78 rldicr r1,r1,0,59 > > 10000c1c: 00 00 00 38 li r0,0 > > 10000c20: 81 ff 21 f8 stdu r1,-128(r1) > > 10000c24: a6 03 08 7c mtlr r0 > > 10000c28: 00 00 01 f8 std r0,0(r1) > > 10000c2c: 10 80 02 e9 ld r8,-32752(r2) > > > > The fault you're seeing is the first store using the stack pointer (r1), > > which is setup by the kernel. > > > > The fault address f0656d90 is weirdly low, the stack should be up near 128TB. > > > > I'm not sure how we end up with a bad r1. > > > > Can you dump some info about the kernel that was built, something like: > > > > $ file /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux > > > > And maybe paste/attach the full log, maybe there's a clue somewhere. > > You can now download the content of > `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01` > [1, 65 MB]. > > Can you reproduce the segmentation fault with the line below? > > $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 > -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial > stdio -m 512 -kernel > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux > -append "debug_boot_weak_hash panic=-1 console=ttyS0 > torture.disable_onoff_at_boot locktorture.onoff_interval=3 > locktorture.onoff_holdoff=30 locktorture.stat_interval=15 > locktorture.shutdown_secs=60 locktorture.verbose=1" > > > Kind regards, > > Paul > > > [1]: > https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz