On Wed, Apr 13, 2022 at 06:32:44PM -0700, Luis Chamberlain wrote: > On Wed, Apr 13, 2022 at 06:22:05PM -0700, Bart Van Assche wrote: > > On 4/13/22 18:11, Luis Chamberlain wrote: > > > My exclusion list one-liner is getting longer, but hey, no crashes yet. > > > > > > i=0; while true; do use_siw=1 ./check -q srp -x srp/001 -x srp/005 -x srp/006 -x srp/011 -x srp/012 -x srp/013 ; if [[ $? -ne 0 ]]; then echo "BAD at $i"; break; else echo GOOOD $i ; fi; let i=$i+1; done; > > > > An exclusion list? Why? The SRP tests are stable. I think that all test > > failures indicate a kernel bug. > > Oh boy. OK. Well I get a failure on all tests unfortunately. I've only > gotten a kernel splat for the other test I mentioned and test 002 for > which I attach the respective dmesg. The other ones just eventually fail > if run in a loop. The prior email didn't mail it to the list so I'm trimming the kernel log below to only the kernel warning so it at least gets archived and others get it. [ 171.959312] run blktests srp/002 at 2022-04-14 01:29:08 [ 172.177267] null_blk: module loaded [ 172.257984] SoftiWARP attached <-- snip --> [ 195.215244] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260 [ 195.218424] sd 3:0:0:2: [sdc] Attached SCSI disk [ 195.218783] ------------[ cut here ]------------ [ 195.221242] WARNING: CPU: 7 PID: 201 at drivers/infiniband/sw/siw/siw_cm.c:255 siw_cep_put+0x125/0x130 [siw] [ 195.222838] Modules linked in: ib_srp(E) scsi_transport_srp(E) target_core_pscsi(E) target_core_file(E) ib_srpt(E) target_core_iblock(E) target_core_mod(E) rdma_cm(E) iw_cm(E) ib_cm(E) scsi_debug(E) siw(E) null_blk(E) ib_umad(E) ib_uverbs(E) sd_mod(E) sg(E) dm_service_time(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) dm_multipath(E) ib_core(E) dm_mod(E) nvme_fabrics(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) joydev(E) evdev(E) serio_raw(E) cirrus(E) drm_shmem_helper(E) drm_kms_helper(E) virtio_balloon(E) cec(E) i6300esb(E) button(E) drm(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) zstd_compress(E) libcrc32c(E) crc32c_generic(E) virtio_net(E) net_failover(E) failover(E) virtio_blk(E) ata_generic(E) uhci_hcd(E) ehci_hcd(E) crc32_pclmul(E) crc32c_intel(E) ata_piix(E) psmouse(E) nvme(E) libata(E) virtio_pci(E) [ 195.222986] virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E) usbcore(E) virtio(E) usb_common(E) scsi_mod(E) nvme_core(E) i2c_piix4(E) virtio_ring(E) t10_pi(E) scsi_common(E) [last unloaded: null_blk] [ 195.241036] sd 3:0:0:1: [sdd] Attached SCSI disk [ 195.241188] CPU: 2 PID: 201 Comm: kworker/u16:22 Kdump: loaded Tainted: G E 5.17.0-rc7 #1 [ 195.246053] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 195.249123] Workqueue: iw_cm_wq cm_work_handler [iw_cm] [ 195.251274] RIP: 0010:siw_cep_put+0x125/0x130 [siw] [ 195.253548] Code: bb c0 e8 ae 74 0f d7 48 89 ef 5d 41 5c 41 5d e9 b1 d6 ef d6 5d be 03 00 00 00 41 5c 41 5d e9 22 b7 0c d7 0f 0b e9 f3 fe ff ff <0f> 0b e9 1c ff ff ff 0f 1f 40 00 0f 1f 44 00 00 55 48 8d 6f 20 53 [ 195.258982] RSP: 0018:ffffbc53404ebc98 EFLAGS: 00010286 [ 195.261018] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000 [ 195.263569] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffa03d1102a924 [ 195.266151] RBP: ffffa03d1102a900 R08: ffffa03d1102a920 R09: ffffbc53404ebc50 [ 195.269150] R10: ffffffff98a060e0 R11: 0000000000000000 R12: ffffa03cc4297000 [ 195.272744] R13: ffffa03d2a48aea0 R14: ffffa03d2a48ae78 R15: ffffa03cc427ad58 [ 195.275575] FS: 0000000000000000(0000) GS:ffffa03df7c80000(0000) knlGS:0000000000000000 [ 195.278932] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 195.280963] CR2: 00005590bc2e4fe8 CR3: 000000008500a004 CR4: 0000000000770ee0 [ 195.282803] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 195.284650] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 195.286522] PKRU: 55555554 [ 195.287998] Call Trace: [ 195.289210] <TASK> [ 195.290969] siw_reject+0xac/0x180 [siw] [ 195.292679] iw_cm_reject+0x68/0xc0 [iw_cm] [ 195.294136] cm_work_handler+0x59d/0xe20 [iw_cm] [ 195.295588] process_one_work+0x1e2/0x3b0 [ 195.298338] worker_thread+0x50/0x3a0 [ 195.300330] ? rescuer_thread+0x390/0x390 [ 195.302269] kthread+0xe5/0x110 [ 195.304062] ? kthread_complete_and_exit+0x20/0x20 [ 195.307612] ret_from_fork+0x1f/0x30 [ 195.309585] </TASK> [ 195.310674] ---[ end trace 0000000000000000 ]--- [ 195.313290] scsi host4: ib_srp: REJ received [ 195.313293] scsi host4: REJ reason 0xffffff98 [ 195.315433] scsi host4: ib_srp: Connection 0/8 to 172.17.8.113 failed [ 195.472718] ib_srp:srp_parse_in: ib_srp: 172.17.8.113 -> 172.17.8.113:0 [ 195.472739] ib_srp:srp_parse_in: ib_srp: 172.17.8.113:5555 -> 172.17.8.113:5555 [ 195.472807] ib_srp:srp_parse_in: ib_srp: [fe80::5054:ff:fe5b:90dc%3] -> [fe80::5054:ff:fe5b:90dc]:0/202442865%3 <-- snip --> Luis