Re: rcutorture’s init segfaults in ppc64le VM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Michael,


Am 11.02.22 um 02:48 schrieb Michael Ellerman:
Paul Menzel writes:
Am 08.02.22 um 11:09 schrieb Michael Ellerman:
Paul Menzel writes:

[…]

On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
5.17-rc2+ with rcutorture tests

I'm not sure if that's the host kernel version or the version you're
using of rcutorture? Can you tell us the sha1 of your host kernel and of
the tree you're running rcutorture from?

The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
I am unable to find the exact sha1.

      $ more /proc/version
      Linux version 5.17.0-rc1+ (x@xxxxxxxxxxxxxxxxxx) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022

OK. In general rc1 kernels can have issues, so it might be worth
rebooting the host into either v5.17-rc3 or a distro or stable kernel.
Just to rule out any issues on the host.

Yes, that was a good test. It works with Ubuntu’s 5.13 Linux kernel.

    $ more /proc/version
Linux version 5.13.0-28-generic (buildd@bos02-ppc64el-013) (gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37) #31-Ubuntu SMP Thu Jan 13 17:40:19 UTC 2022

I have to do more tests, but it could be LLVM/clang related.

The Linux tree, from where I run rcutorture from, is at commit
dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:

      $ git log --oneline -6
      207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems with rcutorture on ppc64le: allmodconfig(2) and other failures
      8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
      a447541d925f ata: libata-sata: remove debounce delay by default
      afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
      f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
      dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3

       $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10

the built init

       $ file tools/testing/selftests/rcutorture/initrd/init
       tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped

Mine looks pretty much identical:

    $ file tools/testing/selftests/rcutorture/initrd/init
    tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped

segfaults in QEMU. From one of the log files

But mine doesn't segfault, it runs fine and the test completes.

What qemu version are you using?

I tried 4.2.1 and 6.2.0, both worked.

      $ qemu-system-ppc64le --version
      QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
      Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

OK, that's one difference between our setups, but I'd be surprised if it
explains this bug, but I guess anything's possible.

/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log

Sorry, that was the wrong path/test. The correct one for the excerpt
below is:

/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log

(For TREE03, QEMU does not start the Linux kernel at all, that means no
output after:

      Booting Linux via __start() @ 0x0000000000400000 ...

OK yeah I see that too.

Removing "threadirqs" from tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
seems to fix it.

Nice find. I have no idea, what that means though.

I still see some preempt related warnings, we clearly have some bugs
with preempt enabled.

You can now download the content of
`/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01`
[1, 65 MB].

Can you reproduce the segmentation fault with the line below?

      $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 \
      -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m 512 \
      -kernel /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux \
      -append "debug_boot_weak_hash panic=-1 console=ttyS0 \
      torture.disable_onoff_at_boot locktorture.onoff_interval=3 \
      locktorture.onoff_holdoff=30 locktorture.stat_interval=15 \
      locktorture.shutdown_secs=60 locktorture.verbose=1"

That works fine for me, boots and runs the test, then shuts down.

I assume you see the segfault on every boot, not intermittently?

So the differences between our setups are the host kernel and the qemu
version. Can you try a different host kernel easily?

The other thing would be to try a different qemu version, you might need
to build from source, but it's not that hard :)

Indeed. I needed to find a current Meson, but then it didn’t make a difference, as found out above, it’s related to the Linux kernel.


Kind regards,

Paul



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux