We've been hitting the following panic on recent runs on block tree when running the "storage/blk suite" [1]. WE've hit it only on ppc64le [2] and aarch64[3]. The 3 commits that we've reproduced it so far are: Commit: 4e9e1af58800 - Merge branch 'for-5.15/block' into for-next Commit: 39a7b1209b44 - Merge branch 'io_uring-bio-cache.3' into for-next Commit: 9b1a1a00a51e - Merge branch 'for-5.15/io_uring' into for-next [ 4703.622170] xfs filesystem being mounted at /mnt/blktests supports timestamps until 2038 (0x7fffffff) [ 4734.025241] restraintd[1155]: *** Current Time: Wed Aug 11 03:42:52 2021 Localwatchdog at: Wed Aug 11 04:38:52 2021 [ 4734.367901] XFS (nvme0n1): Unmounting Filesystem [ 4734.448255] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1" [ 4734.576623] BUG: Kernel NULL pointer dereference on read at 0x00000328 [ 4734.576646] Faulting instruction address: 0xc0000000009725dc [ 4734.576657] Oops: Kernel access of bad area, sig: 11 [#1] [ 4734.576665] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV [ 4734.576676] Modules linked in: nvme_loop nvme_fabrics nvmet nvme_core loop dm_log_writes dm_flakey rfkill sunrpc tg3 ses enclosure scsi_transport_sas rtc_opal powernv_rng ipmi_powernv crct10dif_vpmsum i2c_opal ipmi_devintf ipmi_msghandler leds_powernv drm fuse drm_panel_orientation_quirks i2c_core zram ip_tables xfs vmx_crypto crc32c_vpmsum ipr [last unloaded: nvme_core] [ 4734.576748] CPU: 14 PID: 0 Comm: swapper/14 Not tainted 5.14.0-rc5 #1 [ 4734.576759] NIP: c0000000009725dc LR: c000000000926c14 CTR: c000000000972570 [ 4734.576769] REGS: c0000000096db500 TRAP: 0380 Not tainted (5.14.0-rc5) [ 4734.576778] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24004222 XER: 20000000 [ 4734.576800] CFAR: c000000000926c10 IRQMASK: 0 [ 4734.576800] GPR00: c000000000926c14 c0000000096db7a0 c000000002844e00 c0000000811b7c80 [ 4734.576800] GPR04: 0000000000000000 0000000000000800 0000000000000800 0000000000000000 [ 4734.576800] GPR08: ffffffffffffffff 0000000000000000 c0000000da89bfe8 0000000000000020 [ 4734.576800] GPR12: c000000000972570 c0000003fffb2280 c0000003fa35ff90 0000000000000100 [ 4734.576800] GPR16: 0000000000000001 0000000004200042 c000000002873a00 0000000000000001 [ 4734.576800] GPR20: c0000003fbe09568 c0000003fbe09528 c0000000096db910 0000000000000000 [ 4734.576800] GPR24: 0000000000000020 ffffffffffffffff c0000000811b7c90 0000000000000002 [ 4734.576800] GPR28: c00000000d82d260 0000000000000000 0000000000000000 c000000010e4ff00 [ 4734.576902] NIP [c0000000009725dc] wb_timer_fn+0x6c/0x5b0 [ 4734.576916] LR [c000000000926c14] blk_stat_timer_fn+0x1c4/0x200 [ 4734.576927] Call Trace: [ 4734.576931] [c0000000096db7a0] [000000000000000e] 0xe (unreliable) [ 4734.576943] [c0000000096db800] [c000000000926c14] blk_stat_timer_fn+0x1c4/0x200 [ 4734.576955] [c0000000096db860] [c00000000022f350] call_timer_fn+0x50/0x1c0 [ 4734.576968] [c0000000096db8f0] [c00000000022f7e4] __run_timers.part.0+0x324/0x450 [ 4734.576980] [c0000000096db9c0] [c00000000022f964] run_timer_softirq+0x54/0xa0 [ 4734.576993] [c0000000096db9f0] [c00000000113a780] __do_softirq+0x160/0x3e0 [ 4734.577007] [c0000000096dbae0] [c00000000015ade4] __irq_exit_rcu+0x1b4/0x1c0 [ 4734.577020] [c0000000096dbb10] [c00000000015afc0] irq_exit+0x20/0x40 [ 4734.577031] [c0000000096dbb30] [c0000000000251b4] timer_interrupt+0x184/0x400 [ 4734.577044] [c0000000096dbb90] [c0000000000163a4] replay_soft_interrupts+0x124/0x2c0 [ 4734.577057] [c0000000096dbd70] [c000000000016678] arch_local_irq_restore+0x138/0x170 [ 4734.577070] [c0000000096dbda0] [c000000000dbd2f4] cpuidle_enter_state+0x104/0x560 [ 4734.577084] [c0000000096dbe00] [c000000000dbd7ec] cpuidle_enter+0x4c/0x70 [ 4734.577096] [c0000000096dbe40] [c0000000001ab718] do_idle+0x368/0x470 [ 4734.577109] [c0000000096dbf00] [c0000000001aba88] cpu_startup_entry+0x38/0x50 [ 4734.577121] [c0000000096dbf30] [c00000000005b9ac] start_secondary+0x29c/0x2b0 [ 4734.577134] [c0000000096dbf90] [c00000000000d354] start_secondary_prolog+0x10/0x14 [ 4734.577146] Instruction dump: [ 4734.577153] ebe30060 83df0098 813f00b8 7d3e4a14 83df00d8 e95f0060 ebbf0028 7fde4a14 [ 4734.577171] eb830050 7bde0020 e92a0090 2c3d0000 <e9290328> eb490098 418200ec e93f0030 [ 4734.577190] ---[ end trace 28f0fbead3bf79d0 ]--- [ 4734.577992] [ 4735.578017] Kernel panic - not syncing: Aiee, killing interrupt handler! [ 4735.579752] ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]--- [1] https://gitlab.com/cki-project/kernel-tests/-/tree/main/storage/blktests/blk [2] https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/datawarehouse-public/2021/08/10/351063758/build_ppc64le_redhat%3A1492847161/tests/10473601_ppc64le_1_console.log [3] https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/datawarehouse-public/2021/08/10/351063758/build_aarch64_redhat%3A1492847160/tests/10473599_aarch64_1_console.log Thank you, Bruno On Wed, Aug 11, 2021 at 1:35 PM CKI Project <cki-project@xxxxxxxxxx> wrote: > > > Hello, > > We ran automated tests on a recent commit from this kernel tree: > > Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git > Commit: 4e9e1af58800 - Merge branch 'for-5.15/block' into for-next > > The results of these automated tests are provided below. > > Overall result: FAILED (see details below) > Merge: OK > Compile: OK > Tests: PANICKED > > All kernel binaries, config files, and logs are available for download here: > > https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefix=datawarehouse-public/2021/08/10/351063758 > > One or more kernel tests failed: > > s390x: > ❌ LTP > > ppc64le: > ❌ LTP > 💥 POSIX pjd-fstest suites > ❌ Loopdev Sanity > ❌ Memory: fork_mem > 💥 Storage blktests > > aarch64: > 💥 Storage blktests > ❌ LTP > > We hope that these logs can help you find the problem quickly. For the full > detail on our testing procedures, please scroll to the bottom of this message. > > Please reply to this email if you have any questions about the tests that we > ran or if you have any suggestions on how to make future tests more effective. > > ,-. ,-. > ( C ) ( K ) Continuous > `-',-.`-' Kernel > ( I ) Integration > `-' > ______________________________________________________________________________ > > Compile testing > --------------- > > We compiled the kernel for 4 architectures: > > aarch64: > make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg > > ppc64le: > make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg > > s390x: > make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg > > x86_64: > make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg > > > > Hardware testing > ---------------- > We booted each kernel and ran the following tests: > > aarch64: > Host 1: > ✅ Boot test > ✅ Reboot test > ✅ xfstests - ext4 > ✅ xfstests - xfs > ✅ storage: software RAID testing > ✅ Storage: swraid mdadm raid_module test > 🚧 ❌ xfstests - btrfs > 🚧 💥 Storage blktests > 🚧 ⚡⚡⚡ Storage block - filesystem fio test > 🚧 ⚡⚡⚡ Storage block - queue scheduler test > 🚧 ⚡⚡⚡ Storage nvme - tcp > 🚧 ⚡⚡⚡ stress: stress-ng > > Host 2: > ✅ Boot test > ✅ Reboot test > ✅ ACPI table test > ❌ LTP > ⚡⚡⚡ CIFS Connectathon > ⚡⚡⚡ POSIX pjd-fstest suites > ⚡⚡⚡ Loopdev Sanity > ⚡⚡⚡ Memory: fork_mem > ⚡⚡⚡ Memory function: memfd_create > ⚡⚡⚡ AMTU (Abstract Machine Test Utility) > ⚡⚡⚡ Ethernet drivers sanity > ⚡⚡⚡ storage: SCSI VPD > 🚧 ⚡⚡⚡ xarray-idr-radixtree-test > 🚧 ⚡⚡⚡ NFS Connectathon > 🚧 ⚡⚡⚡ lvm cache test > 🚧 ⚡⚡⚡ lvm snapper test > > ppc64le: > Host 1: > ✅ Boot test > ✅ Reboot test > ❌ LTP > ✅ CIFS Connectathon > 💥 POSIX pjd-fstest suites > ❌ Loopdev Sanity > ❌ Memory: fork_mem > ⚡⚡⚡ Memory function: memfd_create > ⚡⚡⚡ AMTU (Abstract Machine Test Utility) > ⚡⚡⚡ Ethernet drivers sanity > 🚧 ⚡⚡⚡ xarray-idr-radixtree-test > 🚧 ⚡⚡⚡ NFS Connectathon > 🚧 ⚡⚡⚡ lvm cache test > 🚧 ⚡⚡⚡ lvm snapper test > > Host 2: > ✅ Boot test > ✅ Reboot test > ✅ xfstests - ext4 > ✅ xfstests - xfs > ✅ storage: software RAID testing > ✅ Storage: swraid mdadm raid_module test > 🚧 ✅ xfstests - btrfs > 🚧 💥 Storage blktests > 🚧 ⚡⚡⚡ Storage block - filesystem fio test > 🚧 ⚡⚡⚡ Storage block - queue scheduler test > 🚧 ⚡⚡⚡ Storage nvme - tcp > 🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream > > s390x: > Host 1: > ✅ Boot test > ✅ Reboot test > ❌ LTP > ✅ CIFS Connectathon > ✅ POSIX pjd-fstest suites > ✅ Loopdev Sanity > ✅ Memory: fork_mem > ✅ Memory function: memfd_create > ✅ AMTU (Abstract Machine Test Utility) > ✅ Ethernet drivers sanity > 🚧 ❌ xarray-idr-radixtree-test > 🚧 ✅ NFS Connectathon > 🚧 ✅ lvm cache test > 🚧 ✅ lvm snapper test > > Host 2: > > ⚡ Internal infrastructure issues prevented one or more tests (marked > with ⚡⚡⚡) from running on this architecture. > This is not the fault of the kernel that was tested. > > ✅ Boot test > ✅ Reboot test > ✅ Storage: swraid mdadm raid_module test > 🚧 ✅ Storage blktests > 🚧 ✅ Storage nvme - tcp > 🚧 ⚡⚡⚡ stress: stress-ng > > x86_64: > Host 1: > > ⚡ Internal infrastructure issues prevented one or more tests (marked > with ⚡⚡⚡) from running on this architecture. > This is not the fault of the kernel that was tested. > > ⚡⚡⚡ Boot test > ⚡⚡⚡ Reboot test > ⚡⚡⚡ Storage SAN device stress - qla2xxx driver > > Host 2: > > ⚡ Internal infrastructure issues prevented one or more tests (marked > with ⚡⚡⚡) from running on this architecture. > This is not the fault of the kernel that was tested. > > ⚡⚡⚡ Boot test > ⚡⚡⚡ Reboot test > ⚡⚡⚡ Storage SAN device stress - mpt3sas_gen1 > > Host 3: > > ⚡ Internal infrastructure issues prevented one or more tests (marked > with ⚡⚡⚡) from running on this architecture. > This is not the fault of the kernel that was tested. > > ⚡⚡⚡ Boot test > ⚡⚡⚡ Reboot test > ⚡⚡⚡ xfstests - ext4 > ⚡⚡⚡ xfstests - xfs > ⚡⚡⚡ xfstests - nfsv4.2 > ⚡⚡⚡ storage: software RAID testing > ⚡⚡⚡ Storage: swraid mdadm raid_module test > 🚧 ⚡⚡⚡ xfstests - btrfs > 🚧 ⚡⚡⚡ xfstests - cifsv3.11 > 🚧 ⚡⚡⚡ Storage blktests > 🚧 ⚡⚡⚡ Storage block - filesystem fio test > 🚧 ⚡⚡⚡ Storage block - queue scheduler test > 🚧 ⚡⚡⚡ Storage nvme - tcp > 🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream > 🚧 ⚡⚡⚡ stress: stress-ng > > Host 4: > > ⚡ Internal infrastructure issues prevented one or more tests (marked > with ⚡⚡⚡) from running on this architecture. > This is not the fault of the kernel that was tested. > > ⚡⚡⚡ Boot test > ⚡⚡⚡ Reboot test > ⚡⚡⚡ Storage SAN device stress - lpfc driver > > Host 5: > > ⚡ Internal infrastructure issues prevented one or more tests (marked > with ⚡⚡⚡) from running on this architecture. > This is not the fault of the kernel that was tested. > > ✅ Boot test > ✅ Reboot test > ✅ ACPI table test > ✅ LTP > ✅ CIFS Connectathon > ✅ POSIX pjd-fstest suites > ✅ Loopdev Sanity > ✅ Memory: fork_mem > ✅ Memory function: memfd_create > ✅ AMTU (Abstract Machine Test Utility) > ✅ Ethernet drivers sanity > ✅ storage: SCSI VPD > 🚧 ⚡⚡⚡ xarray-idr-radixtree-test > 🚧 ⚡⚡⚡ NFS Connectathon > 🚧 ⚡⚡⚡ lvm cache test > 🚧 ⚡⚡⚡ lvm snapper test > > Host 6: > > ⚡ Internal infrastructure issues prevented one or more tests (marked > with ⚡⚡⚡) from running on this architecture. > This is not the fault of the kernel that was tested. > > ⚡⚡⚡ Boot test > ⚡⚡⚡ Reboot test > ⚡⚡⚡ Storage SAN device stress - qedf driver > > Host 7: > ✅ Boot test > ✅ Reboot test > ✅ Storage SAN device stress - qla2xxx driver > > Host 8: > > ⚡ Internal infrastructure issues prevented one or more tests (marked > with ⚡⚡⚡) from running on this architecture. > This is not the fault of the kernel that was tested. > > ⚡⚡⚡ Boot test > ⚡⚡⚡ Reboot test > ⚡⚡⚡ Storage SAN device stress - qedf driver > > Test sources: https://gitlab.com/cki-project/kernel-tests > 💚 Pull requests are welcome for new tests or improvements to existing tests! > > Aborted tests > ------------- > Tests that didn't complete running successfully are marked with ⚡⚡⚡. > If this was caused by an infrastructure issue, we try to mark that > explicitly in the report. > > Waived tests > ------------ > If the test run included waived tests, they are marked with 🚧. Such tests are > executed but their results are not taken into account. Tests are waived when > their results are not reliable enough, e.g. when they're just introduced or are > being fixed. > > Testing timeout > --------------- > We aim to provide a report within reasonable timeframe. Tests that haven't > finished running yet are marked with ⏱. >