Re: 💥 PANICKED: Test report for kernel 5.14.0-rc5 (block, 4e9e1af5)

Bruno Goncalves <bgoncalv@xxxxxxxxxx> · Wed, 11 Aug 2021 13:43:36 +0200

We've been hitting the following panic on recent runs on block tree
when running the "storage/blk suite" [1]. WE've hit it only  on
ppc64le [2] and aarch64[3].

The 3 commits that we've reproduced it so far are:
Commit: 4e9e1af58800 - Merge branch 'for-5.15/block' into for-next
Commit: 39a7b1209b44 - Merge branch 'io_uring-bio-cache.3' into for-next
Commit: 9b1a1a00a51e - Merge branch 'for-5.15/io_uring' into for-next

[ 4703.622170] xfs filesystem being mounted at /mnt/blktests supports
timestamps until 2038 (0x7fffffff)
[ 4734.025241] restraintd[1155]: *** Current Time: Wed Aug 11 03:42:52
2021  Localwatchdog at: Wed Aug 11 04:38:52 2021
[ 4734.367901] XFS (nvme0n1): Unmounting Filesystem
[ 4734.448255] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
[ 4734.576623] BUG: Kernel NULL pointer dereference on read at 0x00000328
[ 4734.576646] Faulting instruction address: 0xc0000000009725dc
[ 4734.576657] Oops: Kernel access of bad area, sig: 11 [#1]
[ 4734.576665] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[ 4734.576676] Modules linked in: nvme_loop nvme_fabrics nvmet
nvme_core loop dm_log_writes dm_flakey rfkill sunrpc tg3 ses enclosure
scsi_transport_sas rtc_opal powernv_rng ipmi_powernv crct10dif_vpmsum
i2c_opal ipmi_devintf ipmi_msghandler leds_powernv drm fuse
drm_panel_orientation_quirks i2c_core zram ip_tables xfs vmx_crypto
crc32c_vpmsum ipr [last unloaded: nvme_core]
[ 4734.576748] CPU: 14 PID: 0 Comm: swapper/14 Not tainted 5.14.0-rc5 #1
[ 4734.576759] NIP:  c0000000009725dc LR: c000000000926c14 CTR: c000000000972570
[ 4734.576769] REGS: c0000000096db500 TRAP: 0380   Not tainted  (5.14.0-rc5)
[ 4734.576778] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR:
24004222  XER: 20000000
[ 4734.576800] CFAR: c000000000926c10 IRQMASK: 0
[ 4734.576800] GPR00: c000000000926c14 c0000000096db7a0
c000000002844e00 c0000000811b7c80
[ 4734.576800] GPR04: 0000000000000000 0000000000000800
0000000000000800 0000000000000000
[ 4734.576800] GPR08: ffffffffffffffff 0000000000000000
c0000000da89bfe8 0000000000000020
[ 4734.576800] GPR12: c000000000972570 c0000003fffb2280
c0000003fa35ff90 0000000000000100
[ 4734.576800] GPR16: 0000000000000001 0000000004200042
c000000002873a00 0000000000000001
[ 4734.576800] GPR20: c0000003fbe09568 c0000003fbe09528
c0000000096db910 0000000000000000
[ 4734.576800] GPR24: 0000000000000020 ffffffffffffffff
c0000000811b7c90 0000000000000002
[ 4734.576800] GPR28: c00000000d82d260 0000000000000000
0000000000000000 c000000010e4ff00
[ 4734.576902] NIP [c0000000009725dc] wb_timer_fn+0x6c/0x5b0
[ 4734.576916] LR [c000000000926c14] blk_stat_timer_fn+0x1c4/0x200
[ 4734.576927] Call Trace:
[ 4734.576931] [c0000000096db7a0] [000000000000000e] 0xe (unreliable)
[ 4734.576943] [c0000000096db800] [c000000000926c14]
blk_stat_timer_fn+0x1c4/0x200
[ 4734.576955] [c0000000096db860] [c00000000022f350] call_timer_fn+0x50/0x1c0
[ 4734.576968] [c0000000096db8f0] [c00000000022f7e4]
__run_timers.part.0+0x324/0x450
[ 4734.576980] [c0000000096db9c0] [c00000000022f964] run_timer_softirq+0x54/0xa0
[ 4734.576993] [c0000000096db9f0] [c00000000113a780] __do_softirq+0x160/0x3e0
[ 4734.577007] [c0000000096dbae0] [c00000000015ade4] __irq_exit_rcu+0x1b4/0x1c0
[ 4734.577020] [c0000000096dbb10] [c00000000015afc0] irq_exit+0x20/0x40
[ 4734.577031] [c0000000096dbb30] [c0000000000251b4] timer_interrupt+0x184/0x400
[ 4734.577044] [c0000000096dbb90] [c0000000000163a4]
replay_soft_interrupts+0x124/0x2c0
[ 4734.577057] [c0000000096dbd70] [c000000000016678]
arch_local_irq_restore+0x138/0x170
[ 4734.577070] [c0000000096dbda0] [c000000000dbd2f4]
cpuidle_enter_state+0x104/0x560
[ 4734.577084] [c0000000096dbe00] [c000000000dbd7ec] cpuidle_enter+0x4c/0x70
[ 4734.577096] [c0000000096dbe40] [c0000000001ab718] do_idle+0x368/0x470
[ 4734.577109] [c0000000096dbf00] [c0000000001aba88] cpu_startup_entry+0x38/0x50
[ 4734.577121] [c0000000096dbf30] [c00000000005b9ac] start_secondary+0x29c/0x2b0
[ 4734.577134] [c0000000096dbf90] [c00000000000d354]
start_secondary_prolog+0x10/0x14
[ 4734.577146] Instruction dump:
[ 4734.577153] ebe30060 83df0098 813f00b8 7d3e4a14 83df00d8 e95f0060
ebbf0028 7fde4a14
[ 4734.577171] eb830050 7bde0020 e92a0090 2c3d0000 <e9290328> eb490098
418200ec e93f0030
[ 4734.577190] ---[ end trace 28f0fbead3bf79d0 ]---
[ 4734.577992]
[ 4735.578017] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 4735.579752] ---[ end Kernel panic - not syncing: Aiee, killing
interrupt handler! ]---

[1] https://gitlab.com/cki-project/kernel-tests/-/tree/main/storage/blktests/blk
[2] https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/datawarehouse-public/2021/08/10/351063758/build_ppc64le_redhat%3A1492847161/tests/10473601_ppc64le_1_console.log
[3] https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/datawarehouse-public/2021/08/10/351063758/build_aarch64_redhat%3A1492847160/tests/10473599_aarch64_1_console.log

Thank you,
Bruno

On Wed, Aug 11, 2021 at 1:35 PM CKI Project <cki-project@xxxxxxxxxx> wrote:
>
>
> Hello,
>
> We ran automated tests on a recent commit from this kernel tree:
>
>        Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git
>             Commit: 4e9e1af58800 - Merge branch 'for-5.15/block' into for-next
>
> The results of these automated tests are provided below.
>
>     Overall result: FAILED (see details below)
>              Merge: OK
>            Compile: OK
>              Tests: PANICKED
>
> All kernel binaries, config files, and logs are available for download here:
>
>   https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefix=datawarehouse-public/2021/08/10/351063758
>
> One or more kernel tests failed:
>
>     s390x:
>      ❌ LTP
>
>     ppc64le:
>      ❌ LTP
>      💥 POSIX pjd-fstest suites
>      ❌ Loopdev Sanity
>      ❌ Memory: fork_mem
>      💥 Storage blktests
>
>     aarch64:
>      💥 Storage blktests
>      ❌ LTP
>
> We hope that these logs can help you find the problem quickly. For the full
> detail on our testing procedures, please scroll to the bottom of this message.
>
> Please reply to this email if you have any questions about the tests that we
> ran or if you have any suggestions on how to make future tests more effective.
>
>         ,-.   ,-.
>        ( C ) ( K )  Continuous
>         `-',-.`-'   Kernel
>           ( I )     Integration
>            `-'
> ______________________________________________________________________________
>
> Compile testing
> ---------------
>
> We compiled the kernel for 4 architectures:
>
>     aarch64:
>       make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
>
>     ppc64le:
>       make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
>
>     s390x:
>       make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
>
>     x86_64:
>       make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
>
>
>
> Hardware testing
> ----------------
> We booted each kernel and ran the following tests:
>
>   aarch64:
>     Host 1:
>        ✅ Boot test
>        ✅ Reboot test
>        ✅ xfstests - ext4
>        ✅ xfstests - xfs
>        ✅ storage: software RAID testing
>        ✅ Storage: swraid mdadm raid_module test
>        🚧 ❌ xfstests - btrfs
>        🚧 💥 Storage blktests
>        🚧 ⚡⚡⚡ Storage block - filesystem fio test
>        🚧 ⚡⚡⚡ Storage block - queue scheduler test
>        🚧 ⚡⚡⚡ Storage nvme - tcp
>        🚧 ⚡⚡⚡ stress: stress-ng
>
>     Host 2:
>        ✅ Boot test
>        ✅ Reboot test
>        ✅ ACPI table test
>        ❌ LTP
>        ⚡⚡⚡ CIFS Connectathon
>        ⚡⚡⚡ POSIX pjd-fstest suites
>        ⚡⚡⚡ Loopdev Sanity
>        ⚡⚡⚡ Memory: fork_mem
>        ⚡⚡⚡ Memory function: memfd_create
>        ⚡⚡⚡ AMTU (Abstract Machine Test Utility)
>        ⚡⚡⚡ Ethernet drivers sanity
>        ⚡⚡⚡ storage: SCSI VPD
>        🚧 ⚡⚡⚡ xarray-idr-radixtree-test
>        🚧 ⚡⚡⚡ NFS Connectathon
>        🚧 ⚡⚡⚡ lvm cache test
>        🚧 ⚡⚡⚡ lvm snapper test
>
>   ppc64le:
>     Host 1:
>        ✅ Boot test
>        ✅ Reboot test
>        ❌ LTP
>        ✅ CIFS Connectathon
>        💥 POSIX pjd-fstest suites
>        ❌ Loopdev Sanity
>        ❌ Memory: fork_mem
>        ⚡⚡⚡ Memory function: memfd_create
>        ⚡⚡⚡ AMTU (Abstract Machine Test Utility)
>        ⚡⚡⚡ Ethernet drivers sanity
>        🚧 ⚡⚡⚡ xarray-idr-radixtree-test
>        🚧 ⚡⚡⚡ NFS Connectathon
>        🚧 ⚡⚡⚡ lvm cache test
>        🚧 ⚡⚡⚡ lvm snapper test
>
>     Host 2:
>        ✅ Boot test
>        ✅ Reboot test
>        ✅ xfstests - ext4
>        ✅ xfstests - xfs
>        ✅ storage: software RAID testing
>        ✅ Storage: swraid mdadm raid_module test
>        🚧 ✅ xfstests - btrfs
>        🚧 💥 Storage blktests
>        🚧 ⚡⚡⚡ Storage block - filesystem fio test
>        🚧 ⚡⚡⚡ Storage block - queue scheduler test
>        🚧 ⚡⚡⚡ Storage nvme - tcp
>        🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
>
>   s390x:
>     Host 1:
>        ✅ Boot test
>        ✅ Reboot test
>        ❌ LTP
>        ✅ CIFS Connectathon
>        ✅ POSIX pjd-fstest suites
>        ✅ Loopdev Sanity
>        ✅ Memory: fork_mem
>        ✅ Memory function: memfd_create
>        ✅ AMTU (Abstract Machine Test Utility)
>        ✅ Ethernet drivers sanity
>        🚧 ❌ xarray-idr-radixtree-test
>        🚧 ✅ NFS Connectathon
>        🚧 ✅ lvm cache test
>        🚧 ✅ lvm snapper test
>
>     Host 2:
>
>        ⚡ Internal infrastructure issues prevented one or more tests (marked
>        with ⚡⚡⚡) from running on this architecture.
>        This is not the fault of the kernel that was tested.
>
>        ✅ Boot test
>        ✅ Reboot test
>        ✅ Storage: swraid mdadm raid_module test
>        🚧 ✅ Storage blktests
>        🚧 ✅ Storage nvme - tcp
>        🚧 ⚡⚡⚡ stress: stress-ng
>
>   x86_64:
>     Host 1:
>
>        ⚡ Internal infrastructure issues prevented one or more tests (marked
>        with ⚡⚡⚡) from running on this architecture.
>        This is not the fault of the kernel that was tested.
>
>        ⚡⚡⚡ Boot test
>        ⚡⚡⚡ Reboot test
>        ⚡⚡⚡ Storage SAN device stress - qla2xxx driver
>
>     Host 2:
>
>        ⚡ Internal infrastructure issues prevented one or more tests (marked
>        with ⚡⚡⚡) from running on this architecture.
>        This is not the fault of the kernel that was tested.
>
>        ⚡⚡⚡ Boot test
>        ⚡⚡⚡ Reboot test
>        ⚡⚡⚡ Storage SAN device stress - mpt3sas_gen1
>
>     Host 3:
>
>        ⚡ Internal infrastructure issues prevented one or more tests (marked
>        with ⚡⚡⚡) from running on this architecture.
>        This is not the fault of the kernel that was tested.
>
>        ⚡⚡⚡ Boot test
>        ⚡⚡⚡ Reboot test
>        ⚡⚡⚡ xfstests - ext4
>        ⚡⚡⚡ xfstests - xfs
>        ⚡⚡⚡ xfstests - nfsv4.2
>        ⚡⚡⚡ storage: software RAID testing
>        ⚡⚡⚡ Storage: swraid mdadm raid_module test
>        🚧 ⚡⚡⚡ xfstests - btrfs
>        🚧 ⚡⚡⚡ xfstests - cifsv3.11
>        🚧 ⚡⚡⚡ Storage blktests
>        🚧 ⚡⚡⚡ Storage block - filesystem fio test
>        🚧 ⚡⚡⚡ Storage block - queue scheduler test
>        🚧 ⚡⚡⚡ Storage nvme - tcp
>        🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
>        🚧 ⚡⚡⚡ stress: stress-ng
>
>     Host 4:
>
>        ⚡ Internal infrastructure issues prevented one or more tests (marked
>        with ⚡⚡⚡) from running on this architecture.
>        This is not the fault of the kernel that was tested.
>
>        ⚡⚡⚡ Boot test
>        ⚡⚡⚡ Reboot test
>        ⚡⚡⚡ Storage SAN device stress - lpfc driver
>
>     Host 5:
>
>        ⚡ Internal infrastructure issues prevented one or more tests (marked
>        with ⚡⚡⚡) from running on this architecture.
>        This is not the fault of the kernel that was tested.
>
>        ✅ Boot test
>        ✅ Reboot test
>        ✅ ACPI table test
>        ✅ LTP
>        ✅ CIFS Connectathon
>        ✅ POSIX pjd-fstest suites
>        ✅ Loopdev Sanity
>        ✅ Memory: fork_mem
>        ✅ Memory function: memfd_create
>        ✅ AMTU (Abstract Machine Test Utility)
>        ✅ Ethernet drivers sanity
>        ✅ storage: SCSI VPD
>        🚧 ⚡⚡⚡ xarray-idr-radixtree-test
>        🚧 ⚡⚡⚡ NFS Connectathon
>        🚧 ⚡⚡⚡ lvm cache test
>        🚧 ⚡⚡⚡ lvm snapper test
>
>     Host 6:
>
>        ⚡ Internal infrastructure issues prevented one or more tests (marked
>        with ⚡⚡⚡) from running on this architecture.
>        This is not the fault of the kernel that was tested.
>
>        ⚡⚡⚡ Boot test
>        ⚡⚡⚡ Reboot test
>        ⚡⚡⚡ Storage SAN device stress - qedf driver
>
>     Host 7:
>        ✅ Boot test
>        ✅ Reboot test
>        ✅ Storage SAN device stress - qla2xxx driver
>
>     Host 8:
>
>        ⚡ Internal infrastructure issues prevented one or more tests (marked
>        with ⚡⚡⚡) from running on this architecture.
>        This is not the fault of the kernel that was tested.
>
>        ⚡⚡⚡ Boot test
>        ⚡⚡⚡ Reboot test
>        ⚡⚡⚡ Storage SAN device stress - qedf driver
>
>   Test sources: https://gitlab.com/cki-project/kernel-tests
>     💚 Pull requests are welcome for new tests or improvements to existing tests!
>
> Aborted tests
> -------------
> Tests that didn't complete running successfully are marked with ⚡⚡⚡.
> If this was caused by an infrastructure issue, we try to mark that
> explicitly in the report.
>
> Waived tests
> ------------
> If the test run included waived tests, they are marked with 🚧. Such tests are
> executed but their results are not taken into account. Tests are waived when
> their results are not reliable enough, e.g. when they're just introduced or are
> being fixed.
>
> Testing timeout
> ---------------
> We aim to provide a report within reasonable timeframe. Tests that haven't
> finished running yet are marked with ⏱.
>