Re: [PATCH v6 14/33] target: Avoid that target drivers hang if a command is aborted

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Mon, 20 Feb 2017 13:38:56 -0800

On Tue, 2017-02-14 at 16:25 -0800, Bart Van Assche wrote:
> For several target drivers (e.g. ib_srpt and ib_isert) sleeping inside
> transport_generic_free_cmd() causes RDMA completion processing to stall.
> Hence only sleep inside this function if the (iSCSI) target driver
> requires this.
> 
> This patch avoids that messages similar to the following appear in the
> kernel log:
> 
> INFO: task kworker/u25:0:1013 blocked for more than 480 seconds.
>       Tainted: G        W       4.10.0-rc7-dbg+ #3
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u25:0   D    0  1013      2 0x00000000
> Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> Call Trace:
>  __schedule+0x2da/0xb00
>  schedule+0x38/0x90
>  schedule_timeout+0x2fe/0x640
>  wait_for_completion+0xfe/0x160
>  transport_generic_free_cmd+0x2e/0x80 [target_core_mod]
>  srpt_send_done+0x59/0x9f [ib_srpt]
>  __ib_process_cq+0x4b/0xd0 [ib_core]
>  ib_cq_poll_work+0x1b/0x60 [ib_core]
>  process_one_work+0x208/0x6a0
>  worker_thread+0x49/0x4a0
>  kthread+0x107/0x140
>  ret_from_fork+0x2e/0x40
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> ---
>  drivers/target/target_core_transport.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 2ed9721a7202..ab1c493a9a16 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -2504,15 +2504,13 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  	int ret = 0;
>  	bool aborted = false, tas = false;
>  
> -	if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) {
> -		if (wait_for_tasks && (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
> -			target_wait_free_cmd(cmd, &aborted, &tas);
> +	if (wait_for_tasks)
> +		target_wait_free_cmd(cmd, &aborted, &tas);
>  
> +	if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) {
>  		if (!aborted || tas)
>  			ret = transport_put_cmd(cmd);
>  	} else {
> -		if (wait_for_tasks)
> -			target_wait_free_cmd(cmd, &aborted, &tas);
>  		/*
>  		 * Handle WRITE failure case where transport_generic_new_cmd()
>  		 * has already added se_cmd to state_list, but fabric has
> @@ -2535,7 +2533,6 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  	 */
>  	if (aborted) {
>  		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
> -		wait_for_completion(&cmd->cmd_wait_comp);
>  		cmd->se_tfo->release_cmd(cmd);
>  		ret = 1;
>  	}

Removing the aborted wait_for_completion breaks the second order
scenario described for iscsi-target previously here:

http://www.spinics.net/lists/target-devel/msg14456.html

where a concurrent CMD_T_ABORTED occurs while an iscsi connection is
shutting down, and transport_generic_free_cmd() must not return until
cmd->cmd_kref reaches zero and does the complete.

However, looking at transport_generic_free_cmd() in the context of
ib_srpt and ib_isert, I don't see how this scenario can ever happen in
as-is in upstream code.

Namely, all of the transport_generic_free_cmd() callers in ib_srpt and
ib_isert pass wait_for_tasks = false.  And the only time
transport_generic_free_cmd() will block is when wait_for_tasks = true,
or when wait_for_tasks = true && abort = true.

So as-is in upstream, how can transport_generic_free_cmd() ever block
when wait_for_tasks = false..?