Re: [PATCH] libiscsi: ensure session spin lock usage consistent

Lee Duncan <lduncan@xxxxxxxx> · Tue, 20 Feb 2018 10:23:59 -0800

On 02/07/2018 08:36 PM, Chris Leech wrote:
> On Mon, Feb 05, 2018 at 11:13:23AM -0800, Lee Duncan wrote:
>> The libiscsi code was using both spin_lock()/spin_unlock()
>> and spin_lock_bh()/spin_unlock_bh() on its session lock.
>> In addition, lock validation found that libiscsi.c was
>> taking a HARDIRQ-unsafe lock while holding an HARDIRQ-
>> irq-safe lock:
> 
> Lee and I have exchanged a few mails off-list, but I don't think these
> changes are the right thing to address the lockdep warning here.
> 
> I've managed to reproduce this by exporting a scsi_debug lun over iSCSI
> and running fio traffic on a lockdep enabled initiator.
> 
> The deadlock that lockdep is worried about is:
> 
>>   Possible interrupt unsafe locking scenario:
>>
>>        CPU0                    CPU1
>>        ----                    ----
>>   lock(&(&session->frwd_lock)->rlock);
>>                                local_irq_disable();
>>                                lock(&(&q->__queue_lock)->rlock);
>>                                lock(&(&session->frwd_lock)->rlock);
>>   <Interrupt>
>>     lock(&(&q->__queue_lock)->rlock);
>>
>>  *** DEADLOCK ***
> 
> Which I don't think can actually happen unless the queue_lock is being
> taken in hardirq context by any of the iSCSI drivers, but lockdep can't
> know that.  But it does make me wonder if we can do less in
> iscsi_eh_cmd_timed_out while interrupts are blocked.
> 
> Lee, I'm going to test with your patch applied and see what lockdep
> kicks out.

Did you get a chance to run lockdep with the patch set?

I have been looking at this in *much* more depth, to try to really
understand the problem if any. And I believe you are right, that this
deadlock could not happen because the block timmeout routine where the
queue is taken is not called in hard-irq context, so this cannot happen.
Which would explain why it's been there for so long and has not been seen.

And it looks like the fix for this would be to use spin_lock_irq()
instead of spin_lock or spin_lock_bh in this case for the session
forward lock (for some paths?).
> 
> - Chris
>