On Fri, Nov 27, 2020 at 01:34:55PM +0100, Sebastian Andrzej Siewior wrote: > On 2020-11-26 16:53:57 [-0400], Jason Gunthorpe wrote: > > > +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c > > > @@ -187,12 +187,8 @@ iser_initialize_task_headers(struct iscsi_task *task, > > > struct iser_device *device = iser_conn->ib_conn.device; > > > struct iscsi_iser_task *iser_task = task->dd_data; > > > u64 dma_addr; > > > - const bool mgmt_task = !task->sc && !in_interrupt(); > > > int ret = 0; > > > > Why do you think the task->sc doesn't matter? > > Based on the call paths I checked, there was no evidence that > state_mutex can be acquired. If I remove locking here then `mgmt_task' > is no longer needed. That only says there is no recursive deadlock.. > How should task->sc matter? I was able to get the internal bug report that caused the 7414dde0a6c3a commit. The issue here is that the state_mutex is protecting This: if (unlikely(iser_conn->state != ISER_CONN_UP)) { Which indicates that this: dma_addr = ib_dma_map_single(device->ib_device, (void *)tx_desc, Won't crash because iser_con->ib_con is invalid. The notes say that the iSCSI stack is in some state where data traffic won't flow but management traffic is still possible. I suppose this is some fast path so it was "optimized" to eliminate the lock for data traffic. A call chain of interest for the lock at least is: Nov 3 12:24:37 rsws10 BUG: unable to handle kernel Nov 3 12:24:37 NULL pointer dereference Nov 3 12:24:37 rsws10 Pid: 5245, comm: scsi_eh_5 Tainted: GF O 3.8.13-16.2.1.el6uek.x86_64 #1 IBM System x3550 M3 -[7944KEG]-/90Y4784 [..] Nov 3 12:24:37 rsws10 [<ffffffffa069d628>] iscsi_iser_task_init+0x28/0x70 [ib_iser] Nov 3 12:24:37 rsws10 [<ffffffffa0610029>] iscsi_prep_mgmt_task+0x129/0x150 [libiscsi] Nov 3 12:24:37 rsws10 [<ffffffffa061354c>] __iscsi_conn_send_pdu+0x23c/0x310 [libiscsi] Nov 3 12:24:37 rsws10 [<ffffffffa0614277>] iscsi_exec_task_mgmt_fn+0x37/0x290 [libiscsi] Nov 3 12:24:37 rsws10 [<ffffffff813c2694>] ? scsi_send_eh_cmnd+0xd4/0x3a0 Nov 3 12:24:37 rsws10 [<ffffffff810c39df>] ? module_refcount+0x9f/0xc0 Nov 3 12:24:37 rsws10 [<ffffffffa061497b>] iscsi_eh_device_reset+0x1bb/0x2d0 [libiscsi] Nov 3 12:24:37 rsws10 [<ffffffff813c3119>] scsi_eh_bus_device_reset+0xb9/0x1e0 Nov 3 12:24:37 rsws10 [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0 Nov 3 12:24:37 rsws10 [<ffffffff813c3cbe>] scsi_eh_ready_devs+0x5e/0x110 Nov 3 12:24:37 rsws10 [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0 Nov 3 12:24:37 rsws10 [<ffffffff813c3e5d>] scsi_unjam_host+0xed/0x1f0 Nov 3 12:24:37 rsws10 [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0 Nov 3 12:24:37 rsws10 [<ffffffff813c40c8>] scsi_error_handler+0x168/0x1c0 Nov 3 12:24:37 rsws10 [<ffffffff813c3f60>] ? scsi_unjam_host+0x1f0/0x1f0 Nov 3 12:24:37 rsws10 [<ffffffff81082a6e>] kthread+0xce/0xe0 Nov 3 12:24:37 rsws10 [<ffffffff810829a0>] ? kthread_freezable_should_stop+0x70/0x70 Nov 3 12:24:37 rsws10 [<ffffffff8159b66c>] ret_from_fork+0x7c/0xb0 Nov 3 12:24:37 rsws10 [<ffffffff810829a0>] ? kthread_freezable_should_stop+0x70/0x70 So, I think the usual 'pass in atomic context flag' is really needed here Jason