On May 31, 2024 / 13:46, Bart Van Assche wrote: > On 5/31/24 13:35, Zhu Yanjun wrote: > > On Fri, May 31, 2024 at 10:08 PM Bart Van Assche <bvanassche@xxxxxxx> wrote: > > > > > > On 5/31/24 13:06, Zhu Yanjun wrote: > > > > On Fri, May 31, 2024 at 10:01 PM Bart Van Assche <bvanassche@xxxxxxx> wrote: > > > > > > > > > > On 5/31/24 07:35, Zhu Yanjun wrote: > > > > > > IIRC, the problem with srp/002, 011 also occurs with siw driver, do you make > > > > > > tests with siw driver to verify whether the problem with srp/002, 011 is also > fixed or not? > > > > > > > > > > I have not yet seen any failures of any of the SRP tests when using the siw driver. > > > > > What am I missing? > > > > > > > > (left out a bunch of forwarded emails) > > > > > > Forwarding emails is not useful, especially if these emails do not answer the question > > > that I asked. > > > > Bob had made tests with siw. From his mail, it seems that the similar > > problem also occurs with SIW. > > I'm not aware of anyone other than Bob having reported failures of the SRP tests > in combination with the siw driver. I had the same understanding as Bart, and was not aware of the failure with the siw driver. To confirm it, I tried the repeated srp/002 test run with the siw driver and the kernel v6.10-rc2. I expected no failure, but alas, a different failure symptom was observed. It failed due to KASAN slab-use-after-free [1]. I'm not sure if this is the problem that Bob observed. I guess this is the same, old issue that I reported in the past [2]. I will send out a bug report to the linux-rdma list with some more details. [1] ... Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000006d1c31fe with status 5 Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 00000000916ce050 with status 5 Jun 04 09:23:11 testnode2 kernel: ================================================================== Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000001770ef1b with status 5 Jun 04 09:23:11 testnode2 kernel: BUG: KASAN: slab-use-after-free in __mutex_lock+0x1110/0x13c0 Jun 04 09:23:11 testnode2 kernel: Read of size 8 at addr ffff888131a3e418 by task kworker/u16:6/1345 Jun 04 09:23:11 testnode2 kernel: Jun 04 09:23:11 testnode2 kernel: CPU: 1 PID: 1345 Comm: kworker/u16:6 Not tainted 6.10.0-rc2+ #288 Jun 04 09:23:11 testnode2 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Jun 04 09:23:11 testnode2 kernel: Workqueue: iw_cm_wq cm_work_handler [iw_cm] Jun 04 09:23:11 testnode2 kernel: Call Trace: Jun 04 09:23:11 testnode2 kernel: <TASK> Jun 04 09:23:11 testnode2 kernel: dump_stack_lvl+0x6a/0x90 Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 00000000f727e5c2 with status 5 Jun 04 09:23:11 testnode2 kernel: ? __mutex_lock+0x1110/0x13c0 Jun 04 09:23:11 testnode2 kernel: print_report+0x174/0x505 Jun 04 09:23:11 testnode2 kernel: ? __mutex_lock+0x1110/0x13c0 Jun 04 09:23:11 testnode2 kernel: ? __virt_addr_valid+0x1b9/0x400 Jun 04 09:23:11 testnode2 kernel: ? __mutex_lock+0x1110/0x13c0 Jun 04 09:23:11 testnode2 kernel: kasan_report+0xa7/0x180 Jun 04 09:23:11 testnode2 kernel: ? __mutex_lock+0x1110/0x13c0 Jun 04 09:23:11 testnode2 kernel: __mutex_lock+0x1110/0x13c0 Jun 04 09:23:11 testnode2 kernel: ? cma_iw_handler+0xac/0x500 [rdma_cm] Jun 04 09:23:11 testnode2 kernel: ? __lock_acquire+0x139d/0x5d60 Jun 04 09:23:11 testnode2 kernel: ? __pfx___mutex_lock+0x10/0x10 Jun 04 09:23:11 testnode2 kernel: ? mark_lock+0xf5/0x1580 Jun 04 09:23:11 testnode2 kernel: ? __pfx_mark_lock+0x10/0x10 Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000009bc71497 with status 5 Jun 04 09:23:11 testnode2 kernel: ? cma_iw_handler+0xac/0x500 [rdma_cm] Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 0000000041c0fa4b with status 5 Jun 04 09:23:11 testnode2 kernel: cma_iw_handler+0xac/0x500 [rdma_cm] Jun 04 09:23:11 testnode2 kernel: ? __pfx_cma_iw_handler+0x10/0x10 [rdma_cm] Jun 04 09:23:11 testnode2 kernel: ? mark_held_locks+0x94/0xe0 Jun 04 09:23:11 testnode2 kernel: ? _raw_spin_unlock_irqrestore+0x4c/0x60 Jun 04 09:23:11 testnode2 kernel: cm_work_handler+0xb54/0x1c50 [iw_cm] Jun 04 09:23:11 testnode2 kernel: ? __pfx_cm_work_handler+0x10/0x10 [iw_cm] Jun 04 09:23:11 testnode2 kernel: ? __pfx_lock_release+0x10/0x10 Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 00000000f48094cb with status 5 Jun 04 09:23:11 testnode2 kernel: process_one_work+0x865/0x1410 Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000001c3faa8a with status 5 Jun 04 09:23:11 testnode2 kernel: ? __pfx_lock_acquire+0x10/0x10 Jun 04 09:23:11 testnode2 kernel: ? __pfx_process_one_work+0x10/0x10 Jun 04 09:23:11 testnode2 kernel: ? assign_work+0x16c/0x240 Jun 04 09:23:11 testnode2 kernel: ? lock_is_held_type+0xd5/0x130 Jun 04 09:23:11 testnode2 kernel: worker_thread+0x5e2/0x1010 Jun 04 09:23:11 testnode2 kernel: ? __pfx_worker_thread+0x10/0x10 Jun 04 09:23:11 testnode2 kernel: kthread+0x2d1/0x3a0 Jun 04 09:23:11 testnode2 kernel: ? _raw_spin_unlock_irq+0x24/0x50 Jun 04 09:23:11 testnode2 kernel: ? __pfx_kthread+0x10/0x10 Jun 04 09:23:11 testnode2 kernel: ret_from_fork+0x30/0x70 Jun 04 09:23:11 testnode2 kernel: ? __pfx_kthread+0x10/0x10 Jun 04 09:23:11 testnode2 kernel: ret_from_fork_asm+0x1a/0x30 Jun 04 09:23:11 testnode2 kernel: </TASK> Jun 04 09:23:11 testnode2 kernel: Jun 04 09:23:11 testnode2 kernel: Allocated by task 75327: Jun 04 09:23:11 testnode2 kernel: kasan_save_stack+0x2c/0x50 Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000001bd9ea09 with status 5 Jun 04 09:23:11 testnode2 kernel: kasan_save_track+0x10/0x30 Jun 04 09:23:11 testnode2 kernel: __kasan_kmalloc+0xa6/0xb0 Jun 04 09:23:11 testnode2 kernel: __rdma_create_id+0x5b/0x5d0 [rdma_cm] Jun 04 09:23:11 testnode2 kernel: __rdma_create_kernel_id+0x12/0x40 [rdma_cm] Jun 04 09:23:11 testnode2 kernel: srp_new_rdma_cm_id+0x7c/0x200 [ib_srp] Jun 04 09:23:11 testnode2 kernel: add_target_store+0x135e/0x29f0 [ib_srp] Jun 04 09:23:11 testnode2 kernel: ib_srpt receiving failed for ioctx 000000005afc8065 with status 5 Jun 04 09:23:11 testnode2 kernel: kernfs_fop_write_iter+0x3a4/0x5a0 Jun 04 09:23:11 testnode2 kernel: vfs_write+0x5e3/0xe70 Jun 04 09:23:11 testnode2 kernel: ksys_write+0xf7/0x1d0 Jun 04 09:23:11 testnode2 kernel: do_syscall_64+0x93/0x180 Jun 04 09:23:11 testnode2 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Jun 04 09:23:11 testnode2 kernel: Jun 04 09:23:11 testnode2 kernel: Freed by task 66344: Jun 04 09:23:11 testnode2 kernel: kasan_save_stack+0x2c/0x50 Jun 04 09:23:11 testnode2 kernel: kasan_save_track+0x10/0x30 Jun 04 09:23:11 testnode2 kernel: kasan_save_free_info+0x37/0x60 Jun 04 09:23:11 testnode2 kernel: poison_slab_object+0x109/0x180 Jun 04 09:23:11 testnode2 kernel: __kasan_slab_free+0x2e/0x50 Jun 04 09:23:11 testnode2 kernel: kfree+0x11a/0x390 Jun 04 09:23:11 testnode2 kernel: srp_free_ch_ib+0x895/0xc80 [ib_srp] Jun 04 09:23:11 testnode2 kernel: srp_remove_work+0x309/0x6c0 [ib_srp] Jun 04 09:23:11 testnode2 kernel: process_one_work+0x865/0x1410 Jun 04 09:23:11 testnode2 kernel: worker_thread+0x5e2/0x1010 Jun 04 09:23:11 testnode2 kernel: kthread+0x2d1/0x3a0 Jun 04 09:23:11 testnode2 kernel: ret_from_fork+0x30/0x70 Jun 04 09:23:11 testnode2 kernel: ret_from_fork_asm+0x1a/0x30 Jun 04 09:23:11 testnode2 kernel: Jun 04 09:23:11 testnode2 kernel: The buggy address belongs to the object at ffff888131a3e000 which belongs to the cache kmalloc-2k of size 2048 Jun 04 09:23:11 testnode2 kernel: The buggy address is located 1048 bytes inside of freed 2048-byte region [ffff888131a3e000, ffff888131a3e800) Jun 04 09:23:11 testnode2 kernel: Jun 04 09:23:11 testnode2 kernel: The buggy address belongs to the physical page: Jun 04 09:23:11 testnode2 kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888131a38000 pfn:0x131a38 Jun 04 09:23:11 testnode2 kernel: head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 Jun 04 09:23:11 testnode2 kernel: flags: 0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff) Jun 04 09:23:11 testnode2 kernel: page_type: 0xffffefff(slab) Jun 04 09:23:11 testnode2 kernel: raw: 0017ffffc0000240 ffff888100042f00 ffffea0004c89610 ffffea0004a3c010 Jun 04 09:23:11 testnode2 kernel: raw: ffff888131a38000 0000000000080006 00000001ffffefff 0000000000000000 Jun 04 09:23:11 testnode2 kernel: head: 0017ffffc0000240 ffff888100042f00 ffffea0004c89610 ffffea0004a3c010 Jun 04 09:23:11 testnode2 kernel: head: ffff888131a38000 0000000000080006 00000001ffffefff 0000000000000000 Jun 04 09:23:11 testnode2 kernel: head: 0017ffffc0000003 ffffea0004c68e01 ffffffffffffffff 0000000000000000 Jun 04 09:23:11 testnode2 kernel: head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 Jun 04 09:23:11 testnode2 kernel: page dumped because: kasan: bad access detected Jun 04 09:23:11 testnode2 kernel: Jun 04 09:23:11 testnode2 kernel: Memory state around the buggy address: Jun 04 09:23:11 testnode2 kernel: ffff888131a3e300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Jun 04 09:23:11 testnode2 kernel: ffff888131a3e380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Jun 04 09:23:11 testnode2 kernel: >ffff888131a3e400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Jun 04 09:23:11 testnode2 kernel: ^ Jun 04 09:23:11 testnode2 kernel: ffff888131a3e480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Jun 04 09:23:11 testnode2 kernel: ffff888131a3e500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Jun 04 09:23:11 testnode2 kernel: ================================================================== Jun 04 09:23:11 testnode2 kernel: Disabling lock debugging due to kernel taint Jun 04 09:23:11 testnode2 kernel: device-mapper: multipath: 253:2: Failing path 8:80. Jun 04 09:23:11 testnode2 kernel: device-mapper: uevent: dm_send_uevents: skipping sending uevent for lost device ... [2] https://lore.kernel.org/linux-rdma/20230612054237.1855292-1-shinichiro.kawasaki@xxxxxxx/