Re: deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2012-03-19 at 11:51 -0600, Marcus Sorensen wrote:
> In trying out the new ib_srpt in kernel 3.3 I ran into the following:
> 
> INFO: task targetcli:5327 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> targetcli       D ffffffff816144c0     0  5327   4215 0x00000080
>  ffff8804282f5c18 0000000000000086 ffff8804282f5fd8 0000000000013240
>  ffff8804282f4010 0000000000013240 0000000000013240 0000000000013240
>  ffff8804282f5fd8 0000000000013240 ffff88023486f010 ffff88042bc0e5e0
> Call Trace:
>  [<ffffffff814f74bf>] schedule+0x3f/0x60
>  [<ffffffff814f57dd>] schedule_timeout+0x1fd/0x2e0
>  [<ffffffff81089172>] ? enqueue_entity+0x112/0x270
>  [<ffffffff81089334>] ? enqueue_task_fair+0x64/0x130
>  [<ffffffff814f6b66>] wait_for_common+0x116/0x180
>  [<ffffffff8107f800>] ? try_to_wake_up+0x2b0/0x2b0
>  [<ffffffff814f6cad>] wait_for_completion+0x1d/0x20
>  [<ffffffffa0437c06>] transport_clear_lun_from_sessions+0x56/0x80
> [target_core_mod]
>  [<ffffffffa043560f>] core_tpg_post_dellun+0x2f/0x70 [target_core_mod]
>  [<ffffffffa0426fc2>] core_dev_del_lun+0x32/0xa0 [target_core_mod]
>  [<ffffffffa042a000>] target_fabric_port_unlink+0x50/0x60 [target_core_mod]
>  [<ffffffffa030c594>] configfs_unlink+0x104/0x1c0 [configfs]
>  [<ffffffff81179cff>] vfs_unlink+0x9f/0x110
>  [<ffffffff8117d86b>] do_unlinkat+0x19b/0x1d0
>  [<ffffffff810c7b0c>] ? __audit_syscall_entry+0xcc/0x210
>  [<ffffffff810c79e6>] ? __audit_syscall_exit+0x3d6/0x430
>  [<ffffffff8117d8b6>] sys_unlink+0x16/0x20
>  [<ffffffff81500429>] system_call_fastpath+0x16/0x1b
> INFO: task tcm_cl_0:5415 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> tcm_cl_0        D ffffffff816144c0     0  5415      2 0x00000080
>  ffff88043338dce0 0000000000000046 ffff88043338dfd8 0000000000013240
>  ffff88043338c010 0000000000013240 0000000000013240 0000000000013240
>  ffff88043338dfd8 0000000000013240 ffff880234a21460 ffff880428d57010
> Call Trace:
>  [<ffffffff814f74bf>] schedule+0x3f/0x60
>  [<ffffffff814f57dd>] schedule_timeout+0x1fd/0x2e0
>  [<ffffffff814f6b8d>] ? wait_for_common+0x13d/0x180
>  [<ffffffff8107f800>] ? try_to_wake_up+0x2b0/0x2b0
>  [<ffffffffa031229a>] ? srpt_build_cmd_rsp+0x10a/0x190 [ib_srpt]
>  [<ffffffff814f6b66>] wait_for_common+0x116/0x180
>  [<ffffffff8107f800>] ? try_to_wake_up+0x2b0/0x2b0
>  [<ffffffff814f6cad>] wait_for_completion+0x1d/0x20
>  [<ffffffffa04383e3>] transport_lun_wait_for_tasks+0xa3/0x1b0 [target_core_mod]
>  [<ffffffffa04386c3>] __transport_clear_lun_from_sessions+0xb3/0x330
> [target_core_mod]
>  [<ffffffffa0438940>] ?
> __transport_clear_lun_from_sessions+0x330/0x330 [target_core_mod]
>  [<ffffffffa0438956>] transport_clear_lun_thread+0x16/0x30 [target_core_mod]
>  [<ffffffff8106f0fe>] kthread+0x9e/0xb0
>  [<ffffffff815017e4>] kernel_thread_helper+0x4/0x10
>  [<ffffffff8106f060>] ? kthread_freezable_should_stop+0x70/0x70
>  [<ffffffff815017e0>] ? gs_change+0x13/0x13
> 
> 
> In searching around I found this thread:
> http://www.spinics.net/lists/target-devel/msg00606.html
> 
> Which looks to be pretty much the same thing, except I can reproduce
> nearly every time if I try to remove a lun that has an active
> initiator. Not that I'm simply using targetcli 'delete lun=0', I
> suppose this could be a bug in targetcli if it's supposed to its work
> in a particular order. I don't know enough about the configfs
> structure to try doing things manually to see.
> 
> Kernel code is from 3.3.rc3 (been waiting for it to go stable before
> upgrading). lio-utils is master v3.1 from git, targetcli is 2.0rc1.

I believe this particular shutdown bug has been addressed with the
current ib_srpt code in lio-core.git/master @ 3.3-rc6.  It has been
converted to use v3.4 referencing counting via se_cmd->cmd_kref +
target_submit_cmd() usage, and active I/O shutdown should be working
better now.

Note these ib_srpt changes have not made it to target-pending/for-next 
mainline queue just yet as they do need more testing, and there is one
remaining v3.4 conversion that I need to finish up here.

So that said, would you mind verifying that lio-core.git addresses this
bug at your earliest convenience..? 

Thanks,

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux