Re: target crash with latest 4.5.0-rc7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Himanshu,

(Adding target-devel CC' again)

On Mon, 2016-03-14 at 21:25 +0000, Himanshu Madhani wrote:
> Hi Nic, 
> 
> 
> Running latest upstream kernel 4.5.0-rc7 + your
> patch 5643d9c6664beaa171c88dd0a4e99a7420ac50cb (“target: Drop
> incorrect ABORT_TASK put for completed commands”)
> 
> 
> I ran into following stack trace with my script to trigger
> host/bus/device reset in loop after 14 hours of runtime. 
> 
> 
> [52431.733950] qla2xxx [0000:06:00.0]-385e:13: Building additional status packet 0xffff88041d452280.
> [52431.734055] qla2xxx [0000:06:00.0]-385e:13: Building additional status packet 0xffff88041d452340.
> [52557.733014] INFO: task kworker/10:23:11750 blocked for more than 120 seconds.
> [52557.733021]       Tainted: G           OE   4.5.0-rc7+ #47
> [52557.733022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [52557.733024] kworker/10:23   D ffff8800530ef7d8     0 11750      2 0x00000080
> [52557.733050] Workqueue: events qlt_free_session_done [qla2xxx]
> [52557.733053]  ffff8800530ef7d8 0000000000000001 ffff8804276d6870 ffff88042c4b0540
> [52557.733057]  ffff8800b57546c0 0000000000000000 ffff88025f7af740 ffff8800530ef778
> [52557.733060]  ffffffff812ea754 ffff88043f9bf850 0000000000000000 ffff88043f9bf850
> [52557.733064] Call Trace:
> [52557.733072]  [<ffffffff812ea754>] ? queue_unplugged+0x84/0x190
> [52557.733079]  [<ffffffff810a6560>] ? enqueue_sleeper+0xf0/0x580
> [52557.733084]  [<ffffffff810bf3ad>] ? trace_hardirqs_on+0xd/0x10
> [52557.733091]  [<ffffffff8168d887>] schedule+0x47/0xc0
> [52557.733094]  [<ffffffff81691df0>] schedule_timeout+0x1f0/0x300
> [52557.733096]  [<ffffffff810bc73e>] ? __lock_acquired+0x3be/0x400
> [52557.733099]  [<ffffffff8168e992>] ? wait_for_completion+0xe2/0x120
> [52557.733101]  [<ffffffff810bc73e>] ? __lock_acquired+0x3be/0x400
> [52557.733104]  [<ffffffff8168e99a>] wait_for_completion+0xea/0x120
> [52557.733109]  [<ffffffff8109f510>] ? try_to_wake_up+0x410/0x410
> [52557.733133]  [<ffffffffa05f2e1d>] target_wait_for_sess_cmds+0x4d/0x1c0 [target_core_mod]
> [52557.733141]  [<ffffffffa0676f80>] ? qla2xxx_wake_dpc+0x30/0x40 [qla2xxx]
> [52557.733148]  [<ffffffffa0676fe8>] ? qla2x00_post_work+0x58/0x70 [qla2xxx]
> [52557.733152]  [<ffffffffa072b109>] tcm_qla2xxx_free_session+0x49/0x90 [tcm_qla2xxx]
> [52557.733161]  [<ffffffffa06d4009>] qlt_free_session_done+0xf9/0x3d0 [qla2xxx]
> [52557.733164]  [<ffffffff810bc73e>] ? __lock_acquired+0x3be/0x400
> [52557.733169]  [<ffffffff810865e1>] process_one_work+0x231/0x760
> [52557.733172]  [<ffffffff8108653a>] ? process_one_work+0x18a/0x760
> [52557.733174]  [<ffffffff810bc73e>] ? __lock_acquired+0x3be/0x400
> [52557.733177]  [<ffffffff81086d13>] ? worker_thread+0x203/0x530
> [52557.733180]  [<ffffffff81086c7d>] worker_thread+0x16d/0x530
> [52557.733183]  [<ffffffff8109f522>] ? default_wake_function+0x12/0x20
> [52557.733185]  [<ffffffff810b1fc6>] ? __wake_up_common+0x56/0x90
> [52557.733187]  [<ffffffff81086b10>] ? process_one_work+0x760/0x760
> [52557.733190]  [<ffffffff8168d887>] ? schedule+0x47/0xc0
> [52557.733192]  [<ffffffff81086b10>] ? process_one_work+0x760/0x760
> [52557.733195]  [<ffffffff8108ccff>] kthread+0xef/0x110
> [52557.733198]  [<ffffffff8109508d>] ? finish_task_switch+0x8d/0x230
> [52557.733201]  [<ffffffff810970be>] ? schedule_tail+0x1e/0xd0
> [52557.733203]  [<ffffffff8108cc10>] ? __init_kthread_worker+0x70/0x70
> [52557.733205]  [<ffffffff81693dbf>] ret_from_fork+0x3f/0x70
> [52557.733208]  [<ffffffff8108cc10>] ? __init_kthread_worker+0x70/0x70
> [52557.733209] INFO: lockdep is turned off.
> [52557.733216] Sending NMI to all CPUs:
> [52557.736028] NMI backtrace for cpu 0
> 
> 
> Analyzing src in target_wait_for_sess_cmds  we made following change
> to and test has been running for 48+ hours
> 

Thank you for tracking this down.

> 
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 867bc6d..43d8b42 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -2596,8 +2596,6 @@ void target_wait_for_sess_cmds(struct se_session *se_sess)
>  
>         list_for_each_entry_safe(se_cmd, tmp_cmd,
>                                 &se_sess->sess_wait_list, se_cmd_list) {
> -               list_del_init(&se_cmd->se_cmd_list);
> -
>                 pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state:"
>                         " %d\n", se_cmd, se_cmd->t_state,
>                         se_cmd->se_tfo->get_cmd_state(se_cmd));
> (END) 
> 
> 
> Let me know if this fix looks okay to you. 
> 
> 

Applying the following patch with your authorship + stable CC'
to target-pending/for-next.

Thank you,

--nab

>From 484dfe2e26f7c6c1ab463926de4cef5f036043a9 Mon Sep 17 00:00:00 2001
From: Himanshu Madhani <himanshu.madhani@xxxxxxxxxx>
Date: Mon, 14 Mar 2016 22:47:37 -0700
Subject: [PATCH] target: Fix target_release_cmd_kref shutdown comp leak

This patch fixes an active I/O shutdown bug for fabric
drivers using target_wait_for_sess_cmds(), where se_cmd
descriptor shutdown would result in hung tasks waiting
indefinitely for se_cmd->cmd_wait_comp to complete().

To address this bug, drop the incorrect list_del_init()
usage in target_wait_for_sess_cmds() and always complete()
during se_cmd target_release_cmd_kref() put, in order to
let caller invoke the final fabric release callback
into se_cmd->se_tfo->release_cmd() code.

Reported-by: Himanshu Madhani <himanshu.madhani@xxxxxxxxxx>
Tested-by: Himanshu Madhani <himanshu.madhani@xxxxxxxxxx>
Signed-off-by: Himanshu Madhani <himanshu.madhani@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
---
 drivers/target/target_core_transport.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index df01997..734c79e 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2669,8 +2669,6 @@ void target_wait_for_sess_cmds(struct se_session *se_sess)
 
        list_for_each_entry_safe(se_cmd, tmp_cmd,
                                &se_sess->sess_wait_list, se_cmd_list) {
-               list_del_init(&se_cmd->se_cmd_list);
-
                pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state:"
                        " %d\n", se_cmd, se_cmd->t_state,
                        se_cmd->se_tfo->get_cmd_state(se_cmd));
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux