Hi Bart, On 1/29/17, 7:54 PM, "Bart Van Assche" <Bart.VanAssche@xxxxxxxxxxx> wrote: >Hello Himanshu, > >While testing the v4.10-rc5 qla2xxx driver I ran into the following: >* qla24xx_msix_default() grabs ha->hardware_lock before it calls > qla2x00_async_event(). The latter function calls > qlt_schedule_sess_for_deletion_lock() indirectly, and that function > grabs ha->tgt.sess_lock. >* qlt_abort_work() grabs ha->tgt.sess_lock first and next it grabs > ha->hardware_lock. > >As the below lockdep complaint illustrates this leads to a deadlock >every now and then. Do you agree that this was introduced by patch >"qla2xxx: Remove dependency on hardware_lock to reduce lock contention" >(2015-12-17)? Anyway, whether or not that patch introduced this bug, >please analyze this and propose a fix. > >Thank you, > >Bart. > >====================================================== >[ INFO: possible circular locking dependency detected ] >4.10.0-rc5-dbg+ #6 Not tainted >------------------------------------------------------- >kworker/0:2/112 is trying to acquire lock: > (&(&ha->hardware_lock)->rlock){-.-...}, at: [<ffffffffa04af2ed>] qlt_sess_work_fn+0x21d/0x480 [qla2xxx] >but task is already holding lock: > (&(&ha->tgt.sess_lock)->rlock){-.-...}, at: [<ffffffffa04af539>] qlt_sess_work_fn+0x469/0x480 [qla2xxx] >which lock already depends on the new lock. > >the existing dependency chain (in reverse order) is: >-> #1 (&(&ha->tgt.sess_lock)->rlock){-.-...}: >[<ffffffff810bd76f>] lock_acquire+0xbf/0x210 >[<ffffffff815672d3>] _raw_spin_lock_irqsave+0x53/0x70 >[<ffffffffa04b22bb>] qlt_schedule_sess_for_deletion_lock+0x2b/0x50 [qla2xxx] >[<ffffffffa0454d11>] qla2x00_mark_all_devices_lost+0x71/0x200 [qla2xxx] >[<ffffffffa0477644>] qla2x00_async_event+0xb64/0x1980 [qla2xxx] >[<ffffffffa0478d71>] qla24xx_msix_default+0x261/0x2b0 [qla2xxx] >[<ffffffff810c7bbc>] __handle_irq_event_percpu+0x5c/0x380 >[<ffffffff810c7f03>] handle_irq_event_percpu+0x23/0x60 >[<ffffffff810c7f79>] handle_irq_event+0x39/0x60 >[<ffffffff810cb423>] handle_edge_irq+0x93/0x170 >[<ffffffff8102063a>] handle_irq+0x1a/0x30 >[<ffffffff81569d38>] do_IRQ+0x68/0x130 >[<ffffffff81567e53>] ret_from_intr+0x0/0x20 >[<ffffffff81566356>] native_safe_halt+0x6/0x10 >[<ffffffff81565ef0>] default_idle+0x20/0x1a0 >[<ffffffff8102852f>] arch_cpu_idle+0xf/0x20 >[<ffffffff81566583>] default_idle_call+0x23/0x40 >[<ffffffff810b5248>] do_idle+0x188/0x1e0 >[<ffffffff810b558d>] cpu_startup_entry+0x1d/0x20 >[<ffffffff81042368>] start_secondary+0x108/0x130 >[<ffffffff810001c4>] verify_cpu+0x0/0xfc >-> #0 (&(&ha->hardware_lock)->rlock){-.-...}: >[<ffffffff810bd135>] __lock_acquire+0x1425/0x1620 >[<ffffffff810bd76f>] lock_acquire+0xbf/0x210 >[<ffffffff815672d3>] _raw_spin_lock_irqsave+0x53/0x70 >[<ffffffffa04af2ed>] qlt_sess_work_fn+0x21d/0x480 [qla2xxx] >[<ffffffff81086024>] process_one_work+0x1f4/0x6e0 >[<ffffffff8108655e>] worker_thread+0x4e/0x4a0 >[<ffffffff8108d6bc>] kthread+0x10c/0x140 >[<ffffffff81567721>] ret_from_fork+0x31/0x40 > >other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(&(&ha->tgt.sess_lock)->rlock); > lock(&(&ha->hardware_lock)->rlock); > lock(&(&ha->tgt.sess_lock)->rlock); > lock(&(&ha->hardware_lock)->rlock); > > *** DEADLOCK *** > >3 locks held by kworker/0:2/112: > #0: ("events"){.+.+.+}, at: [<ffffffff81085fa5>] process_one_work+0x175/0x6e0 > #1: ((&tgt->sess_work)){+.+.+.}, at: [<ffffffff81085fa5>] process_one_work+0x175/0x6e0 > #2: (&(&ha->tgt.sess_lock)->rlock){-.-...}, at: [<ffffffffa04af539>] qlt_sess_work_fn+0x469/0x480 [qla2xxx] > >stack backtrace: >CPU: 0 PID: 112 Comm: kworker/0:2 Not tainted 4.10.0-rc5-dbg+ #6 >Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >Workqueue: events qlt_sess_work_fn [qla2xxx] >Call Trace: > dump_stack+0x85/0xc2 > print_circular_bug+0x1e3/0x250 > __lock_acquire+0x1425/0x1620 > lock_acquire+0xbf/0x210 > _raw_spin_lock_irqsave+0x53/0x70 > qlt_sess_work_fn+0x21d/0x480 [qla2xxx] > process_one_work+0x1f4/0x6e0 > worker_thread+0x4e/0x4a0 > kthread+0x10c/0x140 > ret_from_fork+0x31/0x40 > > >(gdb) list *(qlt_sess_work_fn+0x21d) > >0x6531d is in qlt_sess_work_fn (drivers/scsi/qla2xxx/qla_target.c:5698). >5693 } >5694 } >5695 >5696 spin_lock_irqsave(&ha->hardware_lock, flags); >5697 >5698 if (tgt->tgt_stop) >5699 goto out_term; >5700 >5701 rc = __qlt_24xx_handle_abts(vha, &prm->abts, sess); >5702 if (rc != 0) Thanks for reporting. We'll analyze code and will post patch. - Himanshu ��.n��������+%������w��{.n����j�����{ay�ʇڙ���f���h������_�(�階�ݢj"��������G����?���&��