Re: [PATCH] IB/iser: Remove in_interrupt() usage.

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Fri, 27 Nov 2020 15:14:32 +0100

On 2020-11-27 09:03:14 [-0400], Jason Gunthorpe wrote:
> I was able to get the internal bug report that caused the
> 7414dde0a6c3a commit.
> 
> The issue here is that the state_mutex is protecting 
> 
> This:
> 
> 	if (unlikely(iser_conn->state != ISER_CONN_UP)) {
> 
> Which indicates that this:
> 
>         dma_addr = ib_dma_map_single(device->ib_device, (void *)tx_desc,
> 
> Won't crash because iser_con->ib_con is invalid. The notes say that
> the iSCSI stack is in some state where data traffic won't flow but
> management traffic is still possible. I suppose this is some fast path
> so it was "optimized" to eliminate the lock for data traffic.
> 
> A call chain of interest for the lock at least is:
> 
> Nov  3 12:24:37 rsws10 BUG: unable to handle kernel 
> Nov  3 12:24:37 NULL pointer dereference
> Nov  3 12:24:37 rsws10 Pid: 5245, comm: scsi_eh_5 Tainted: GF          O 3.8.13-16.2.1.el6uek.x86_64 #1 IBM System x3550 M3 -[7944KEG]-/90Y4784
> [..]
> Nov  3 12:24:37 rsws10  [<ffffffffa069d628>] iscsi_iser_task_init+0x28/0x70 [ib_iser]
> Nov  3 12:24:37 rsws10  [<ffffffffa0610029>] iscsi_prep_mgmt_task+0x129/0x150 [libiscsi]
> Nov  3 12:24:37 rsws10  [<ffffffffa061354c>] __iscsi_conn_send_pdu+0x23c/0x310 [libiscsi]
> Nov  3 12:24:37 rsws10  [<ffffffffa0614277>] iscsi_exec_task_mgmt_fn+0x37/0x290 [libiscsi]
> Nov  3 12:24:37 rsws10  [<ffffffffa061497b>] iscsi_eh_device_reset+0x1bb/0x2d0 [libiscsi]

preemptible until here and this function has:

|	mutex_lock(&session->eh_mutex);
|	spin_lock_bh(&session->frwd_lock);

I don't see the lock dropped between here and iscsi_iser_task_init().

> Nov  3 12:24:37 rsws10  [<ffffffff813c3119>] scsi_eh_bus_device_reset+0xb9/0x1e0

> Nov  3 12:24:37 rsws10  [<ffffffff810829a0>] ? kthread_freezable_should_stop+0x70/0x70
> 
> So, I think the usual 'pass in atomic context flag' is really needed
> here

Sure, I would do that but as noted above, it the `frwd_lock' is acquired
so you can't acquire the mutex here.

> Jason

Sebastian