Deadlock between qla24xx interrupt handler and qlt_abort_work()

Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx> · Mon, 30 Jan 2017 03:54:50 +0000

Hello Himanshu,

While testing the v4.10-rc5 qla2xxx driver I ran into the following:
* qla24xx_msix_default() grabs ha->hardware_lock before it calls
  qla2x00_async_event(). The latter function calls
  qlt_schedule_sess_for_deletion_lock() indirectly, and that function
  grabs ha->tgt.sess_lock.
* qlt_abort_work() grabs ha->tgt.sess_lock first and next it grabs
  ha->hardware_lock.

As the below lockdep complaint illustrates this leads to a deadlock
every now and then. Do you agree that this was introduced by patch
"qla2xxx: Remove dependency on hardware_lock to reduce lock contention"
(2015-12-17)? Anyway, whether or not that patch introduced this bug,
please analyze this and propose a fix.

Thank you,

Bart.

======================================================
[ INFO: possible circular locking dependency detected ]
4.10.0-rc5-dbg+ #6 Not tainted
-------------------------------------------------------
kworker/0:2/112 is trying to acquire lock:
 (&(&ha->hardware_lock)->rlock){-.-...}, at: [<ffffffffa04af2ed>] qlt_sess_work_fn+0x21d/0x480 [qla2xxx]
but task is already holding lock:
 (&(&ha->tgt.sess_lock)->rlock){-.-...}, at: [<ffffffffa04af539>] qlt_sess_work_fn+0x469/0x480 [qla2xxx]
which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:
-> #1 (&(&ha->tgt.sess_lock)->rlock){-.-...}:
[<ffffffff810bd76f>] lock_acquire+0xbf/0x210
[<ffffffff815672d3>] _raw_spin_lock_irqsave+0x53/0x70
[<ffffffffa04b22bb>] qlt_schedule_sess_for_deletion_lock+0x2b/0x50 [qla2xxx]
[<ffffffffa0454d11>] qla2x00_mark_all_devices_lost+0x71/0x200 [qla2xxx]
[<ffffffffa0477644>] qla2x00_async_event+0xb64/0x1980 [qla2xxx]
[<ffffffffa0478d71>] qla24xx_msix_default+0x261/0x2b0 [qla2xxx]
[<ffffffff810c7bbc>] __handle_irq_event_percpu+0x5c/0x380
[<ffffffff810c7f03>] handle_irq_event_percpu+0x23/0x60
[<ffffffff810c7f79>] handle_irq_event+0x39/0x60
[<ffffffff810cb423>] handle_edge_irq+0x93/0x170
[<ffffffff8102063a>] handle_irq+0x1a/0x30
[<ffffffff81569d38>] do_IRQ+0x68/0x130
[<ffffffff81567e53>] ret_from_intr+0x0/0x20
[<ffffffff81566356>] native_safe_halt+0x6/0x10
[<ffffffff81565ef0>] default_idle+0x20/0x1a0
[<ffffffff8102852f>] arch_cpu_idle+0xf/0x20
[<ffffffff81566583>] default_idle_call+0x23/0x40
[<ffffffff810b5248>] do_idle+0x188/0x1e0
[<ffffffff810b558d>] cpu_startup_entry+0x1d/0x20
[<ffffffff81042368>] start_secondary+0x108/0x130
[<ffffffff810001c4>] verify_cpu+0x0/0xfc
-> #0 (&(&ha->hardware_lock)->rlock){-.-...}:
[<ffffffff810bd135>] __lock_acquire+0x1425/0x1620
[<ffffffff810bd76f>] lock_acquire+0xbf/0x210
[<ffffffff815672d3>] _raw_spin_lock_irqsave+0x53/0x70
[<ffffffffa04af2ed>] qlt_sess_work_fn+0x21d/0x480 [qla2xxx]
[<ffffffff81086024>] process_one_work+0x1f4/0x6e0
[<ffffffff8108655e>] worker_thread+0x4e/0x4a0
[<ffffffff8108d6bc>] kthread+0x10c/0x140
[<ffffffff81567721>] ret_from_fork+0x31/0x40

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&(&ha->tgt.sess_lock)->rlock);
                               lock(&(&ha->hardware_lock)->rlock);
                               lock(&(&ha->tgt.sess_lock)->rlock);
  lock(&(&ha->hardware_lock)->rlock);

 *** DEADLOCK ***

3 locks held by kworker/0:2/112:
 #0:  ("events"){.+.+.+}, at: [<ffffffff81085fa5>] process_one_work+0x175/0x6e0
 #1:  ((&tgt->sess_work)){+.+.+.}, at: [<ffffffff81085fa5>] process_one_work+0x175/0x6e0
 #2:  (&(&ha->tgt.sess_lock)->rlock){-.-...}, at: [<ffffffffa04af539>] qlt_sess_work_fn+0x469/0x480 [qla2xxx]

stack backtrace:
CPU: 0 PID: 112 Comm: kworker/0:2 Not tainted 4.10.0-rc5-dbg+ #6
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: events qlt_sess_work_fn [qla2xxx]
Call Trace:
 dump_stack+0x85/0xc2
 print_circular_bug+0x1e3/0x250
 __lock_acquire+0x1425/0x1620
 lock_acquire+0xbf/0x210
 _raw_spin_lock_irqsave+0x53/0x70
 qlt_sess_work_fn+0x21d/0x480 [qla2xxx]
 process_one_work+0x1f4/0x6e0
 worker_thread+0x4e/0x4a0
 kthread+0x10c/0x140
 ret_from_fork+0x31/0x40

(gdb) list *(qlt_sess_work_fn+0x21d)

0x6531d is in qlt_sess_work_fn (drivers/scsi/qla2xxx/qla_target.c:5698).
5693                    }
5694            }
5695
5696            spin_lock_irqsave(&ha->hardware_lock, flags);
5697
5698            if (tgt->tgt_stop)
5699                    goto out_term;
5700
5701            rc = __qlt_24xx_handle_abts(vha, &prm->abts, sess);
5702            if (rc != 0)��.n��������+%������w��{.n����j�����{ay�ʇڙ���f���h������_�(�階�ݢj"��������G����?���&��