On 2020-11-27 09:03:14 [-0400], Jason Gunthorpe wrote: > I was able to get the internal bug report that caused the > 7414dde0a6c3a commit. > > The issue here is that the state_mutex is protecting > > This: > > if (unlikely(iser_conn->state != ISER_CONN_UP)) { > > Which indicates that this: > > dma_addr = ib_dma_map_single(device->ib_device, (void *)tx_desc, > > Won't crash because iser_con->ib_con is invalid. The notes say that > the iSCSI stack is in some state where data traffic won't flow but > management traffic is still possible. I suppose this is some fast path > so it was "optimized" to eliminate the lock for data traffic. > > A call chain of interest for the lock at least is: > > Nov 3 12:24:37 rsws10 BUG: unable to handle kernel > Nov 3 12:24:37 NULL pointer dereference > Nov 3 12:24:37 rsws10 Pid: 5245, comm: scsi_eh_5 Tainted: GF O 3.8.13-16.2.1.el6uek.x86_64 #1 IBM System x3550 M3 -[7944KEG]-/90Y4784 > [..] > Nov 3 12:24:37 rsws10 [<ffffffffa069d628>] iscsi_iser_task_init+0x28/0x70 [ib_iser] > Nov 3 12:24:37 rsws10 [<ffffffffa0610029>] iscsi_prep_mgmt_task+0x129/0x150 [libiscsi] > Nov 3 12:24:37 rsws10 [<ffffffffa061354c>] __iscsi_conn_send_pdu+0x23c/0x310 [libiscsi] > Nov 3 12:24:37 rsws10 [<ffffffffa0614277>] iscsi_exec_task_mgmt_fn+0x37/0x290 [libiscsi] > Nov 3 12:24:37 rsws10 [<ffffffffa061497b>] iscsi_eh_device_reset+0x1bb/0x2d0 [libiscsi] preemptible until here and this function has: | mutex_lock(&session->eh_mutex); | spin_lock_bh(&session->frwd_lock); I don't see the lock dropped between here and iscsi_iser_task_init(). > Nov 3 12:24:37 rsws10 [<ffffffff813c3119>] scsi_eh_bus_device_reset+0xb9/0x1e0 > Nov 3 12:24:37 rsws10 [<ffffffff810829a0>] ? kthread_freezable_should_stop+0x70/0x70 > > So, I think the usual 'pass in atomic context flag' is really needed > here Sure, I would do that but as noted above, it the `frwd_lock' is acquired so you can't acquire the mutex here. > Jason Sebastian