Re: Deadlock between qla24xx interrupt handler and qlt_abort_work()

"Madhani, Himanshu" <Himanshu.Madhani@xxxxxxxxxx> · Mon, 30 Jan 2017 19:03:09 +0000

Hi Bart, 

On 1/29/17, 7:54 PM, "Bart Van Assche" <Bart.VanAssche@xxxxxxxxxxx> wrote:

>Hello Himanshu,
>
>While testing the v4.10-rc5 qla2xxx driver I ran into the following:
>* qla24xx_msix_default() grabs ha->hardware_lock before it calls
>  qla2x00_async_event(). The latter function calls
>  qlt_schedule_sess_for_deletion_lock() indirectly, and that function
>  grabs ha->tgt.sess_lock.
>* qlt_abort_work() grabs ha->tgt.sess_lock first and next it grabs
>  ha->hardware_lock.
>
>As the below lockdep complaint illustrates this leads to a deadlock
>every now and then. Do you agree that this was introduced by patch
>"qla2xxx: Remove dependency on hardware_lock to reduce lock contention"
>(2015-12-17)? Anyway, whether or not that patch introduced this bug,
>please analyze this and propose a fix.
>
>Thank you,
>
>Bart.
>
>======================================================
>[ INFO: possible circular locking dependency detected ]
>4.10.0-rc5-dbg+ #6 Not tainted
>-------------------------------------------------------
>kworker/0:2/112 is trying to acquire lock:
> (&(&ha->hardware_lock)->rlock){-.-...}, at: [<ffffffffa04af2ed>] qlt_sess_work_fn+0x21d/0x480 [qla2xxx]
>but task is already holding lock:
> (&(&ha->tgt.sess_lock)->rlock){-.-...}, at: [<ffffffffa04af539>] qlt_sess_work_fn+0x469/0x480 [qla2xxx]
>which lock already depends on the new lock.
>
>the existing dependency chain (in reverse order) is:
>-> #1 (&(&ha->tgt.sess_lock)->rlock){-.-...}:
>[<ffffffff810bd76f>] lock_acquire+0xbf/0x210
>[<ffffffff815672d3>] _raw_spin_lock_irqsave+0x53/0x70
>[<ffffffffa04b22bb>] qlt_schedule_sess_for_deletion_lock+0x2b/0x50 [qla2xxx]
>[<ffffffffa0454d11>] qla2x00_mark_all_devices_lost+0x71/0x200 [qla2xxx]
>[<ffffffffa0477644>] qla2x00_async_event+0xb64/0x1980 [qla2xxx]
>[<ffffffffa0478d71>] qla24xx_msix_default+0x261/0x2b0 [qla2xxx]
>[<ffffffff810c7bbc>] __handle_irq_event_percpu+0x5c/0x380
>[<ffffffff810c7f03>] handle_irq_event_percpu+0x23/0x60
>[<ffffffff810c7f79>] handle_irq_event+0x39/0x60
>[<ffffffff810cb423>] handle_edge_irq+0x93/0x170
>[<ffffffff8102063a>] handle_irq+0x1a/0x30
>[<ffffffff81569d38>] do_IRQ+0x68/0x130
>[<ffffffff81567e53>] ret_from_intr+0x0/0x20
>[<ffffffff81566356>] native_safe_halt+0x6/0x10
>[<ffffffff81565ef0>] default_idle+0x20/0x1a0
>[<ffffffff8102852f>] arch_cpu_idle+0xf/0x20
>[<ffffffff81566583>] default_idle_call+0x23/0x40
>[<ffffffff810b5248>] do_idle+0x188/0x1e0
>[<ffffffff810b558d>] cpu_startup_entry+0x1d/0x20
>[<ffffffff81042368>] start_secondary+0x108/0x130
>[<ffffffff810001c4>] verify_cpu+0x0/0xfc
>-> #0 (&(&ha->hardware_lock)->rlock){-.-...}:
>[<ffffffff810bd135>] __lock_acquire+0x1425/0x1620
>[<ffffffff810bd76f>] lock_acquire+0xbf/0x210
>[<ffffffff815672d3>] _raw_spin_lock_irqsave+0x53/0x70
>[<ffffffffa04af2ed>] qlt_sess_work_fn+0x21d/0x480 [qla2xxx]
>[<ffffffff81086024>] process_one_work+0x1f4/0x6e0
>[<ffffffff8108655e>] worker_thread+0x4e/0x4a0
>[<ffffffff8108d6bc>] kthread+0x10c/0x140
>[<ffffffff81567721>] ret_from_fork+0x31/0x40
>
>other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
>       CPU0                    CPU1
>       ----                    ----
>  lock(&(&ha->tgt.sess_lock)->rlock);
>                               lock(&(&ha->hardware_lock)->rlock);
>                               lock(&(&ha->tgt.sess_lock)->rlock);
>  lock(&(&ha->hardware_lock)->rlock);
>
> *** DEADLOCK ***
>
>3 locks held by kworker/0:2/112:
> #0:  ("events"){.+.+.+}, at: [<ffffffff81085fa5>] process_one_work+0x175/0x6e0
> #1:  ((&tgt->sess_work)){+.+.+.}, at: [<ffffffff81085fa5>] process_one_work+0x175/0x6e0
> #2:  (&(&ha->tgt.sess_lock)->rlock){-.-...}, at: [<ffffffffa04af539>] qlt_sess_work_fn+0x469/0x480 [qla2xxx]
>
>stack backtrace:
>CPU: 0 PID: 112 Comm: kworker/0:2 Not tainted 4.10.0-rc5-dbg+ #6
>Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>Workqueue: events qlt_sess_work_fn [qla2xxx]
>Call Trace:
> dump_stack+0x85/0xc2
> print_circular_bug+0x1e3/0x250
> __lock_acquire+0x1425/0x1620
> lock_acquire+0xbf/0x210
> _raw_spin_lock_irqsave+0x53/0x70
> qlt_sess_work_fn+0x21d/0x480 [qla2xxx]
> process_one_work+0x1f4/0x6e0
> worker_thread+0x4e/0x4a0
> kthread+0x10c/0x140
> ret_from_fork+0x31/0x40
>
>
>(gdb) list *(qlt_sess_work_fn+0x21d)
>
>0x6531d is in qlt_sess_work_fn (drivers/scsi/qla2xxx/qla_target.c:5698).
>5693                    }
>5694            }
>5695
>5696            spin_lock_irqsave(&ha->hardware_lock, flags);
>5697
>5698            if (tgt->tgt_stop)
>5699                    goto out_term;
>5700
>5701            rc = __qlt_24xx_handle_abts(vha, &prm->abts, sess);
>5702            if (rc != 0)

Thanks for reporting. We'll analyze code and will post patch. 

- Himanshu
��.n��������+%������w��{.n����j�����{ay�ʇڙ���f���h������_�(�階�ݢj"��������G����?���&��