Re: blktests srp failures with a guest with kdevops on v5.17-rc7 removal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/14/22 13:49, Luis Chamberlain wrote:
On Wed, Apr 13, 2022 at 06:32:44PM -0700, Luis Chamberlain wrote:
On Wed, Apr 13, 2022 at 06:22:05PM -0700, Bart Van Assche wrote:
On 4/13/22 18:11, Luis Chamberlain wrote:
My exclusion list one-liner is getting longer, but hey, no crashes yet.

i=0; while true; do use_siw=1 ./check -q srp -x srp/001 -x srp/005 -x srp/006 -x srp/011 -x srp/012 -x srp/013 ; if [[ $? -ne 0 ]]; then echo "BAD at $i"; break; else echo GOOOD $i ; fi; let i=$i+1; done;

An exclusion list? Why? The SRP tests are stable. I think that all test
failures indicate a kernel bug.

Oh boy. OK. Well I get a failure on all tests unfortunately. I've only
gotten a kernel splat for the other test I mentioned and test 002 for
which I attach the respective dmesg. The other ones just eventually fail
if run in a loop.

The prior email didn't mail it to the list so I'm trimming the kernel
log below to only the kernel warning so it at least gets archived and
others get it.

[  171.959312] run blktests srp/002 at 2022-04-14 01:29:08
[  172.177267] null_blk: module loaded
[  172.257984] SoftiWARP attached

<-- snip -->
[  195.215244] ib_srp:srp_max_it_iu_len: ib_srp: max_iu_len = 8260
[  195.218424] sd 3:0:0:2: [sdc] Attached SCSI disk
[  195.218783] ------------[ cut here ]------------
[  195.221242] WARNING: CPU: 7 PID: 201 at drivers/infiniband/sw/siw/siw_cm.c:255 siw_cep_put+0x125/0x130 [siw]
[  195.222838] Modules linked in: ib_srp(E) scsi_transport_srp(E) target_core_pscsi(E) target_core_file(E) ib_srpt(E) target_core_iblock(E) target_core_mod(E) rdma_cm(E) iw_cm(E) ib_cm(E) scsi_debug(E) siw(E) null_blk(E) ib_umad(E) ib_uverbs(E) sd_mod(E) sg(E) dm_service_time(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) dm_multipath(E) ib_core(E) dm_mod(E) nvme_fabrics(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) cryptd(E) joydev(E) evdev(E) serio_raw(E) cirrus(E) drm_shmem_helper(E) drm_kms_helper(E) virtio_balloon(E) cec(E) i6300esb(E) button(E) drm(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) zstd_compress(E) libcrc32c(E) crc32c_generic(E) virtio_net(E) net_failover(E) failover(E) virtio_blk(E) ata_generic(E) uhci_hcd(E) ehci_hcd(E) crc32_pclmul(E) crc32c_intel(E) ata_piix(E) psmouse(E) nvme(E) libata(E) virtio_pci(E)
[  195.222986]  virtio_pci_legacy_dev(E) virtio_pci_modern_dev(E) usbcore(E) virtio(E) usb_common(E) scsi_mod(E) nvme_core(E) i2c_piix4(E) virtio_ring(E) t10_pi(E) scsi_common(E) [last unloaded: null_blk]
[  195.241036] sd 3:0:0:1: [sdd] Attached SCSI disk
[  195.241188] CPU: 2 PID: 201 Comm: kworker/u16:22 Kdump: loaded Tainted: G            E     5.17.0-rc7 #1
[  195.246053] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[  195.249123] Workqueue: iw_cm_wq cm_work_handler [iw_cm]
[  195.251274] RIP: 0010:siw_cep_put+0x125/0x130 [siw]
[  195.253548] Code: bb c0 e8 ae 74 0f d7 48 89 ef 5d 41 5c 41 5d e9 b1 d6 ef d6 5d be 03 00 00 00 41 5c 41 5d e9 22 b7 0c d7 0f 0b e9 f3 fe ff ff <0f> 0b e9 1c ff ff ff 0f 1f 40 00 0f 1f 44 00 00 55 48 8d 6f 20 53
[  195.258982] RSP: 0018:ffffbc53404ebc98 EFLAGS: 00010286
[  195.261018] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
[  195.263569] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffa03d1102a924
[  195.266151] RBP: ffffa03d1102a900 R08: ffffa03d1102a920 R09: ffffbc53404ebc50
[  195.269150] R10: ffffffff98a060e0 R11: 0000000000000000 R12: ffffa03cc4297000
[  195.272744] R13: ffffa03d2a48aea0 R14: ffffa03d2a48ae78 R15: ffffa03cc427ad58
[  195.275575] FS:  0000000000000000(0000) GS:ffffa03df7c80000(0000) knlGS:0000000000000000
[  195.278932] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  195.280963] CR2: 00005590bc2e4fe8 CR3: 000000008500a004 CR4: 0000000000770ee0
[  195.282803] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  195.284650] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  195.286522] PKRU: 55555554
[  195.287998] Call Trace:
[  195.289210]  <TASK>
[  195.290969]  siw_reject+0xac/0x180 [siw]
[  195.292679]  iw_cm_reject+0x68/0xc0 [iw_cm]
[  195.294136]  cm_work_handler+0x59d/0xe20 [iw_cm]
[  195.295588]  process_one_work+0x1e2/0x3b0
[  195.298338]  worker_thread+0x50/0x3a0
[  195.300330]  ? rescuer_thread+0x390/0x390
[  195.302269]  kthread+0xe5/0x110
[  195.304062]  ? kthread_complete_and_exit+0x20/0x20
[  195.307612]  ret_from_fork+0x1f/0x30
[  195.309585]  </TASK>
[  195.310674] ---[ end trace 0000000000000000 ]---
[  195.313290] scsi host4: ib_srp: REJ received
[  195.313293] scsi host4:   REJ reason 0xffffff98
[  195.315433] scsi host4: ib_srp: Connection 0/8 to 172.17.8.113 failed
[  195.472718] ib_srp:srp_parse_in: ib_srp: 172.17.8.113 -> 172.17.8.113:0
[  195.472739] ib_srp:srp_parse_in: ib_srp: 172.17.8.113:5555 -> 172.17.8.113:5555
[  195.472807] ib_srp:srp_parse_in: ib_srp: [fe80::5054:ff:fe5b:90dc%3] -> [fe80::5054:ff:fe5b:90dc]:0/202442865%3

This is unexpected - I had not yet seen the above message while running the SRP tests. How about posting the above report on the linux-rdma mailing list and Cc-ing the Soft-iWARP maintainer?

PS: a fix for the v5.18-rc1 rdma_rxe driver is under development. See also https://lore.kernel.org/linux-rdma/20220414161846.GM64706@xxxxxxxx/T/#mb765c8708437d6e313766a89d6c6e9b103b8d546

Thanks,

Bart.




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux