On 9/21/21 1:08 PM, Bernard Metzler wrote:
I further investigated srp blktest with software rdma drivers and I am still running into issues. These seem not to be specific to using rxe or siw driver, but happen with both occasionally. Can we run tests using hardware rdma drivers with that blktest tool as well? First I see some WARNINGs which relate to resources not created or unable to get destroyed (maybe since not created before): ... [ 1437.197989] sd 11:0:0:1: [sde] Attached SCSI disk [ 1437.845266] ------------[ cut here ]------------ [ 1437.845269] WARNING: CPU: 3 PID: 26257 at block/genhd.c:537 device_add_disk+0x1cb/0x3b0 ... [ 1437.845360] Call Trace: [ 1437.845363] dm_setup_md_queue+0xc8/0x100 [ 1437.845368] table_load+0x1be/0x2d0 [ 1437.845371] ctl_ioctl+0x1d6/0x4c0 [ 1437.845373] ? retrieve_status+0x1d0/0x1d0 [ 1437.845377] dm_ctl_ioctl+0xe/0x20 [ 1437.845379] __x64_sys_ioctl+0x118/0x910 [ 1437.845384] ? switch_fpu_return+0x56/0xc0 [ 1437.845388] do_syscall_64+0x3a/0x80 [ 1437.845391] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1437.845395] RIP: 0033:0x7f81419dbb97 [ 1437.845398] Code: 00 00 90 48 8b 05 09 73 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 72 2c 00 f7 d8 64 89 01 48 [ 1437.845400] RSP: 002b:00007f814363b508 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 1437.845402] RAX: ffffffffffffffda RBX: 00007f81423b8d60 RCX: 00007f81419dbb97 [ 1437.845403] RDX: 00007f812c026c30 RSI: 00000000c138fd09 RDI: 0000000000000009 [ 1437.845403] RBP: 00007f81423f38b3 R08: 00007f8143639260 R09: 00007f81426018f8 [ 1437.845404] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f812c026c30 [ 1437.845405] R13: 0000000000000000 R14: 00007f812c026ce0 R15: 00007f812c00adc0 [ 1437.845407] ---[ end trace c416dea93915334e ]--- ... [ 1437.845411] kobject_add_internal failed for dm (error: -2 parent: dm-2) [ 1437.845451] ------------[ cut here ]------------ [ 1437.845451] WARNING: CPU: 3 PID: 26257 at block/genhd.c:564 del_gendisk+0x1a4/0x1e0 ... [ 1437.845516] Call Trace: [ 1437.845517] dm_setup_md_queue+0xef/0x100 [ 1437.845520] table_load+0x1be/0x2d0 [ 1437.845522] ctl_ioctl+0x1d6/0x4c0 [ 1437.845523] ? retrieve_status+0x1d0/0x1d0 [ 1437.845527] dm_ctl_ioctl+0xe/0x20 [ 1437.845528] __x64_sys_ioctl+0x118/0x910 [ 1437.845531] ? switch_fpu_return+0x56/0xc0 [ 1437.845533] do_syscall_64+0x3a/0x80 [ 1437.845535] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1437.845537] RIP: 0033:0x7f81419dbb97 [ 1437.845538] Code: 00 00 90 48 8b 05 09 73 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 72 2c 00 f7 d8 64 89 01 48 [ 1437.845540] RSP: 002b:00007f814363b508 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 1437.845542] RAX: ffffffffffffffda RBX: 00007f81423b8d60 RCX: 00007f81419dbb97 [ 1437.845543] RDX: 00007f812c026c30 RSI: 00000000c138fd09 RDI: 0000000000000009 [ 1437.845544] RBP: 00007f81423f38b3 R08: 00007f8143639260 R09: 00007f81426018f8 [ 1437.845545] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f812c026c30 [ 1437.845546] R13: 0000000000000000 R14: 00007f812c026ce0 R15: 00007f812c00adc0 [ 1437.845547] ---[ end trace c416dea93915334f ]--- ... [ 1437.845552] ------------[ cut here ]------------ [ 1437.845553] kernfs: can not remove 'sdc', no directory [ 1437.845557] WARNING: CPU: 3 PID: 26257 at fs/kernfs/dir.c:1524 kernfs_remove_by_name_ns+0x88/0xa0 [ 1437.845562] Modules linked in: ... [ 1437.845619] Call Trace: [ 1437.845620] sysfs_remove_link+0x19/0x30 [ 1437.845623] bd_unlink_disk_holder+0x6d/0xd0 [ 1437.845627] dm_put_table_device+0x62/0xe0 [ 1437.845629] dm_put_device+0x88/0xe0 [ 1437.845631] ? dm_put_path_selector+0x40/0x50 [dm_multipath] [ 1437.845635] free_priority_group+0x8e/0xc0 [dm_multipath] [ 1437.845638] free_multipath+0x78/0xb0 [dm_multipath] [ 1437.845640] multipath_dtr+0x2a/0x30 [dm_multipath] [ 1437.845642] dm_table_destroy+0x67/0x130 [ 1437.845645] table_load+0x110/0x2d0 [ 1437.845647] ctl_ioctl+0x1d6/0x4c0 [ 1437.845648] ? retrieve_status+0x1d0/0x1d0 [ 1437.845651] dm_ctl_ioctl+0xe/0x20 [ 1437.845653] __x64_sys_ioctl+0x118/0x910 [ 1437.845655] ? switch_fpu_return+0x56/0xc0 [ 1437.845657] do_syscall_64+0x3a/0x80 [ 1437.845659] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1437.845662] RIP: 0033:0x7f81419dbb97 [ 1437.845663] Code: 00 00 90 48 8b 05 09 73 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 72 2c 00 f7 d8 64 89 01 48 [ 1437.845664] RSP: 002b:00007f814363b508 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 1437.845665] RAX: ffffffffffffffda RBX: 00007f81423b8d60 RCX: 00007f81419dbb97 [ 1437.845666] RDX: 00007f812c026c30 RSI: 00000000c138fd09 RDI: 0000000000000009 [ 1437.845667] RBP: 00007f81423f38b3 R08: 00007f8143639260 R09: 00007f81426018f8 [ 1437.845668] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f812c026c30 [ 1437.845669] R13: 0000000000000000 R14: 00007f812c026ce0 R15: 00007f812c00adc0 [ 1437.845670] ---[ end trace c416dea939153350 ]--- and a final Oops close to blk_mq_free_rqs: [ 1438.976875] scsi 11:0:0:1: alua: Detached [ 1438.980927] BUG: unable to handle page fault for address: ffffffffc0d83160 [ 1438.980960] #PF: supervisor read access in kernel mode [ 1438.980978] #PF: error_code(0x0000) - not-present page [ 1438.980995] PGD 15f60e067 P4D 15f60e067 PUD 15f610067 PMD 1bc2e3067 PTE 0 [ 1438.981019] Oops: 0000 [#1] SMP PTI [ 1438.981033] CPU: 3 PID: 26257 Comm: multipathd Tainted: G W 5.15.0-rc1+ #1 [ 1438.981059] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Extreme6, BIOS P2.80 07/01/2013 [ 1438.981088] RIP: 0010:scsi_mq_exit_request+0x18/0x50 [ 1438.981107] Code: 00 00 e8 5b 14 76 00 5d c3 e8 e4 cb e1 ff 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 8b 7f 60 48 89 f3 48 8b 87 98 00 00 00 <48> 8b 40 40 48 85 c0 74 0c 48 8d b6 10 01 00 00 e8 23 14 76 00 48 [ 1438.981160] RSP: 0018:ffffa289c0447b38 EFLAGS: 00010286 [ 1438.981178] RAX: ffffffffc0d83120 RBX: ffff975354360000 RCX: 0000000000000000 [ 1438.981201] RDX: 0000000000000000 RSI: ffff975354360000 RDI: ffff97534cfd1000 [ 1438.981223] RBP: ffffa289c0447b40 R08: 0000000000009c6b R09: 0000000000009c6b [ 1438.981245] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000 [ 1438.981266] R13: ffff97534a34a240 R14: 0000000000000000 R15: 0000000000000000 [ 1438.981288] FS: 00007f814363d700(0000) GS:ffff975357780000(0000) knlGS:0000000000000000 [ 1438.981313] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1438.981331] CR2: ffffffffc0d83160 CR3: 00000001b7fcc006 CR4: 00000000001706e0 [ 1438.981354] Call Trace: [ 1438.981365] blk_mq_free_rqs+0x5f/0x1b0 [ 1438.981381] blk_mq_free_map_and_requests+0x37/0x70 [ 1438.981398] blk_mq_free_tag_set+0x27/0x90 [ 1438.981413] scsi_mq_destroy_tags+0x15/0x20 [ 1438.981429] scsi_host_dev_release+0x8b/0xf0 [ 1438.981445] device_release+0x38/0x90 [ 1438.981459] kobject_put+0x87/0x190 [ 1438.981475] put_device+0x13/0x20 [ 1438.981488] scsi_target_dev_release+0x1f/0x30 [ 1438.981504] device_release+0x38/0x90 [ 1438.981518] kobject_put+0x87/0x190 [ 1438.981532] put_device+0x13/0x20 [ 1438.981544] scsi_device_dev_release_usercontext+0x2a0/0x2b0 [ 1438.981565] execute_in_process_context+0x25/0x70 [ 1438.981583] scsi_device_dev_release+0x1c/0x20 [ 1438.981600] device_release+0x38/0x90 [ 1438.981613] kobject_put+0x87/0x190 [ 1438.981627] put_device+0x13/0x20 [ 1438.981639] scsi_device_put+0x2c/0x30 [ 1438.981653] scsi_disk_put+0x30/0x50 [ 1438.981668] sd_release+0x37/0xb0 [ 1438.981681] blkdev_put_whole+0x30/0x50 [ 1438.981696] blkdev_put+0x92/0x150 [ 1438.981710] blkdev_close+0x27/0x30 [ 1438.981723] __fput+0x8b/0x240 [ 1438.981736] ____fput+0xe/0x10 [ 1438.981748] task_work_run+0x74/0xb0 [ 1438.981762] exit_to_user_mode_prepare+0x14e/0x150 [ 1438.981782] syscall_exit_to_user_mode+0x16/0x30 [ 1438.981799] do_syscall_64+0x46/0x80 [ 1438.981813] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 1438.981831] RIP: 0033:0x7f8142613c47 [ 1438.981845] Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 3f c3 66 0f 1f 44 00 00 53 89 fb 48 83 ec 10 e8 c4 fb ff ff 89 df 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2b 89 d7 89 44 24 0c e8 06 fc ff ff 8b 44 24 [ 1438.981897] RSP: 002b:00007f814363b840 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 [ 1438.981920] RAX: 0000000000000000 RBX: 000000000000000a RCX: 00007f8142613c47 [ 1438.981942] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000000a [ 1438.981964] RBP: 0000000000000008 R08: 0000000000000001 R09: 0000000000000007 [ 1438.981986] R10: 0000000000000000 R11: 0000000000000293 R12: 0000564949b25700 [ 1438.982007] R13: 00007f81432a1ccf R14: 00007f812c02c710 R15: 00007f812c02c710 [ 1438.983180] Modules linked in: ib_srpt target_core_iblock target_core_mod scsi_debug rdma_rxe ip6_udp_tunnel udp_tunnel null_blk dm_service_time configs bridge stp llc nf_nat_ftp nf_conntrack_ftp xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ib_iser ip_set nfnetlink libiscsi ebtable_nat ebtable_broute scsi_transport_iscsi ip6table_mangle ip6table_raw ip6table_security iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6table_nat ip6_tables iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rpcrdma sunrpc ib_ipoib rdma_ucm ib_umad dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua iw_cxgb4 libcxgb intel_rapl_msr intel_rapl_common ib_uverbs x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel rdma_cm iw_cm kvm ib_cm ib_core snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg irqbypass snd_hda_codec crc32_pclmul rapl snd_hwdep snd_hda_core intel_cstate intel_uncore [ 1438.983224] snd_pcm snd_timer iTCO_wdt mei_me snd iTCO_vendor_support mxm_wmi mei soundcore i2c_i801 i2c_smbus lpc_ich wmi xfs i915 i2c_algo_bit ttm drm_kms_helper firewire_ohci firewire_core syscopyarea sysfillrect cxgb4 crc_itu_t sysimgblt fb_sys_fops tg3 drm ptp crc32c_intel csiostor scsi_transport_fc pps_core video [last unloaded: scsi_transport_srp] [ 1438.992637] CR2: ffffffffc0d83160 [ 1438.994057] ---[ end trace c416dea939153351 ]--- [ 1438.995476] RIP: 0010:scsi_mq_exit_request+0x18/0x50 [ 1438.996905] Code: 00 00 e8 5b 14 76 00 5d c3 e8 e4 cb e1 ff 5d c3 66 90 0f 1f 44 00 00 55 48 89 e5 53 48 8b 7f 60 48 89 f3 48 8b 87 98 00 00 00 <48> 8b 40 40 48 85 c0 74 0c 48 8d b6 10 01 00 00 e8 23 14 76 00 48 [ 1438.998414] RSP: 0018:ffffa289c0447b38 EFLAGS: 00010286 [ 1438.999954] RAX: ffffffffc0d83120 RBX: ffff975354360000 RCX: 0000000000000000 [ 1439.001513] RDX: 0000000000000000 RSI: ffff975354360000 RDI: ffff97534cfd1000 [ 1439.003079] RBP: ffffa289c0447b40 R08: 0000000000009c6b R09: 0000000000009c6b [ 1439.004652] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000 [ 1439.006218] R13: ffff97534a34a240 R14: 0000000000000000 R15: 0000000000000000 [ 1439.007777] FS: 00007f814363d700(0000) GS:ffff975357780000(0000) knlGS:0000000000000000 [ 1439.009340] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1439.010906] CR2: ffffffffc0d83160 CR3: 00000001b7fcc006 CR4: 00000000001706e0
(+linux-block) Hi Bernard, If I remember correctly all tests from the blktests suite pass on my test setup with kernel v5.13. I think the above call traces are regressions that have been introduced during the kernel v5.15 merge window in the block layer. Bart.