Dear Michael,
Thank you for looking into this.
Am 08.02.22 um 11:09 schrieb Michael Ellerman:
Paul Menzel writes:
[…]
On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
5.17-rc2+ with rcutorture tests
I'm not sure if that's the host kernel version or the version you're
using of rcutorture? Can you tell us the sha1 of your host kernel and of
the tree you're running rcutorture from?
The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
I am unable to find the exact sha1.
$ more /proc/version
Linux version 5.17.0-rc1+
(pmenzel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) (Ubuntu
clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28
17:13:04 CET 2022
The Linux tree, from where I run rcutorture from, is at commit
dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
$ git log --oneline -6
207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems
with rcutorture on ppc64le: allmodconfig(2) and other failures
8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
a447541d925f ata: libata-sata: remove debounce delay by default
afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3
$ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
the built init
$ file tools/testing/selftests/rcutorture/initrd/init
tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
Mine looks pretty much identical:
$ file tools/testing/selftests/rcutorture/initrd/init
tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
segfaults in QEMU. From one of the log files
But mine doesn't segfault, it runs fine and the test completes.
What qemu version are you using?
I tried 4.2.1 and 6.2.0, both worked.
$ qemu-system-ppc64le --version
QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
Sorry, that was the wrong path/test. The correct one for the excerpt
below is:
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log
(For TREE03, QEMU does not start the Linux kernel at all, that means no
output after:
Booting Linux via __start() @ 0x0000000000400000 ...
)
[ 1.119803][ T1] Run /init as init process
[ 1.122011][ T1] init[1]: segfault (11) at f0656d90 nip 10000a18 lr 0 code 1 in init[10000000+d0000]
[ 1.124863][ T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4bffff58 00000000 01000000 00000580 3c40100f
[ 1.128823][ T1] init[1]: code: 38427c00 7c290b78 782106e4 38000000 <f821ff81> 7c0803a6 f8010000 e9028010
The disassembly from 3c40100f is:
lis r2,4111
addi r2,r2,31744
mr r9,r1
rldicr r1,r1,0,59
li r0,0
stdu r1,-128(r1) <- fault
mtlr r0
std r0,0(r1)
ld r8,-32752(r2)
I think you'll find that's the code at the ELF entry point. You can
check with:
$ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry
Entry point address: 0x10000c0c
$ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 10000c0c
10000c0c: 0e 10 40 3c lis r2,4110
10000c10: 00 7b 42 38 addi r2,r2,31488
10000c14: 78 0b 29 7c mr r9,r1
10000c18: e4 06 21 78 rldicr r1,r1,0,59
10000c1c: 00 00 00 38 li r0,0
10000c20: 81 ff 21 f8 stdu r1,-128(r1)
10000c24: a6 03 08 7c mtlr r0
10000c28: 00 00 01 f8 std r0,0(r1)
10000c2c: 10 80 02 e9 ld r8,-32752(r2)
The fault you're seeing is the first store using the stack pointer (r1),
which is setup by the kernel.
The fault address f0656d90 is weirdly low, the stack should be up near 128TB.
I'm not sure how we end up with a bad r1.
Can you dump some info about the kernel that was built, something like:
$ file /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux
And maybe paste/attach the full log, maybe there's a clue somewhere.
You can now download the content of
`/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01`
[1, 65 MB].
Can you reproduce the segmentation fault with the line below?
$ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8
-net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial
stdio -m 512 -kernel
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux
-append "debug_boot_weak_hash panic=-1 console=ttyS0
torture.disable_onoff_at_boot locktorture.onoff_interval=3
locktorture.onoff_holdoff=30 locktorture.stat_interval=15
locktorture.shutdown_secs=60 locktorture.verbose=1"
Kind regards,
Paul
[1]:
https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz