Re: [PATCH v4 18/37] target: Make ABORT and LUN RESET handling synchronous

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Thu, 09 Feb 2017 02:43:46 -0800

On Wed, 2017-02-08 at 14:24 -0800, Bart Van Assche wrote:
> Instead of invoking target driver callback functions from the
> context that handles an abort or LUN RESET task management function,
> only set the abort flag from that context and perform the actual
> abort handling from the context of the regular command processing
> flow. This approach has the following advantages:
> - The task management function code becomes much easier to read and
>   to verify since the number of potential race conditions against
>   the command processing flow is strongly reduced.
> - It is no longer needed to store the command state into the command
>   itself since that information is no longer needed from the context
>   where a task management function is processed.
> 
> Notes:
> - With this patch applied the CMD_T_ABORTED flag is checked at two points
>   by the target core: just before local execution of a command starts
>   (see also target_execute_cmd()) and also just before the response is
>   sent (see also target_complete_cmd()).
> - A new reference count is introduced that tracks the number of
>   references held by the target driver, namely se_cmd.tgt_ref.
> 

To reiterate the way existing code functions for first and second order
handling, here is the text from the earlier -v2 review to keep the
context fresh.

The quiesce of in-flight se_cmd can happen in three ways:

  - Early submission in target_execute_cmd() to catch CMD_T_STOP
    before se_cmd is submitted to the backend, or
  - waiting for outstanding backend I/O to complete via
    target_complete_cmd() to catch CMD_T_STOP (typical case when a 
    backend device is no responding), or
  - waiting for transport_cmd_check_stop() hand-off of se_cmd from
    target-core to fabric driver response to catch CMD_T_STOP.

This quiesce can happen for se_cmd with ABORT_TASK in
core_tmr_abort_task(), with LUN_RESET it happen for TMRs in
core_tmr_drain_tmr_list() and se_cmd descriptors in
core_tmr_drain_state_list() as part of se_device->state_list.

Once CMD_T_STOP is cleared, target_core_tmr.c will drop both the
outstanding references to se_cmd->cmd_kref, as well as it's own
reference obtained by __target_check_io_state().

The second order issue, which is what complicates things with the
existing code, is that any of the above first order cases can occur and
block waiting for CMD_T_STOP to be cleared at the _same time_ while the
associated fabric driver se_session is being shutdown, and sets
CMD_T_FABRIC_STOP on all the se_session's se_cmd descriptors.

The session shutdown can happen explicitly by the fabric driver, or by a
session reinstatement event (with iscsi-target for example) when a host
doesn't get a TMR response due to backend I/O not being completed for an
extended amount of time.

The dynamic interaction of the first and second order issues here is
what has taken us a long time to get right in upstream code, and which
has been stable for end-users the past year since the advent of commit
0f4a94316.

So with that additional context about how/why things work as they do
today in mainline code, my comments are inline below. 

> Signed-off-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx>
> Cc: Hannes Reinecke <hare@xxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Andy Grover <agrover@xxxxxxxxxx>
> Cc: David Disseldorp <ddiss@xxxxxxx>
> ---
> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
> index 161c8b8482db..980d94277c8d 100644
> --- a/drivers/target/target_core_tmr.c
> +++ b/drivers/target/target_core_tmr.c
> @@ -192,13 +173,13 @@ void core_tmr_abort_task(
>  			continue;
>  		}
>  
> +		se_cmd->send_abort_response = false;
>  		list_del_init(&se_cmd->se_cmd_list);
>  		spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
>  
> -		cancel_work_sync(&se_cmd->work);
> -		transport_wait_for_tasks(se_cmd);
> +		while (!wait_for_completion_timeout(&se_cmd->complete, 180 * HZ))
> +			target_show_cmd(__func__, se_cmd);
>  

This is still fundamentally wrong, because it allows se_cmd to be
dispatched into the backend driver, and only just waits for completion
from the backend after the fact.

Keep in mind existing transport_wait_for_tasks() code sets CMD_T_STOP
and blocks on se_cmd->t_transport_stop_cmd(), so target_execute_cmd()
catches CMD_T_STOP and prevents the backend se_cmd->execute_cmd()
execution from actually occurring.

Whatever changes you are proposing needs to maintain this logic.

> -		transport_cmd_finish_abort(se_cmd, true);
>  		__target_put_sess_cmd(se_cmd);
>  
>  		printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for"
> @@ -295,14 +276,30 @@ static void core_tmr_drain_tmr_list(
>  			(preempt_and_abort_list) ? "Preempt" : "", tmr_p,
>  			tmr_p->function, tmr_p->response, cmd->t_state);
>  
> -		cancel_work_sync(&cmd->work);
> -		transport_wait_for_tasks(cmd);
> -
> -		transport_cmd_finish_abort(cmd, 1);
> +		while (!wait_for_completion_timeout(&cmd->complete, 180 * HZ))
> +			target_show_cmd(__func__, cmd);
>  		__target_put_sess_cmd(cmd);
>  	}
>  }

Likewise here, this doesn't actually prevent TMR execution, it just
waits for it to complete after the fact.  

>  static void core_tmr_drain_state_list(
>  	struct se_device *dev,
>  	struct se_cmd *prout_cmd,
> @@ -311,6 +308,7 @@ static void core_tmr_drain_state_list(
>  	struct list_head *preempt_and_abort_list)
>  {
>  	LIST_HEAD(drain_task_list);
> +	struct se_node_acl *tmr_nacl = tmr_sess ? tmr_sess->se_node_acl : NULL;
>  	struct se_session *sess;
>  	struct se_cmd *cmd, *next;
>  	unsigned long flags;
> @@ -365,6 +363,8 @@ static void core_tmr_drain_state_list(
>  
>  		list_move_tail(&cmd->state_list, &drain_task_list);
>  		cmd->state_active = false;
> +		if (tmr_nacl && tmr_nacl != cmd->se_sess->se_node_acl && tas)
> +			cmd->send_abort_response = true;
>  	}
>  	spin_unlock_irqrestore(&dev->execute_task_lock, flags);
>  
> @@ -387,17 +387,8 @@ static void core_tmr_drain_state_list(
>  			(cmd->transport_state & CMD_T_STOP) != 0,
>  			(cmd->transport_state & CMD_T_SENT) != 0);
>  
> -		/*
> -		 * If the command may be queued onto a workqueue cancel it now.
> -		 *
> -		 * This is equivalent to removal from the execute queue in the
> -		 * loop above, but we do it down here given that
> -		 * cancel_work_sync may block.
> -		 */
> -		cancel_work_sync(&cmd->work);
> -		transport_wait_for_tasks(cmd);
> -
> -		core_tmr_handle_tas_abort(cmd, tas);
> +		while (!wait_for_completion_timeout(&cmd->complete, 180 * HZ))
> +			target_show_cmd(__func__, cmd);
>  		__target_put_sess_cmd(cmd);

And here again too.  You can't just dispatch aborted commands to the
backend and wait until they complete after the fact.

If the command is requesting to being CMD_T_ABORTED,
target_execute_cmd() must catch the command's CMD_T_STOP before it's
dispatched into the backend.

What if the backend is not responding for a long time in the first place
that caused the host to issue LUN_RESET..?

Why would it make sense to keep flooding the device with new commands..?

>  	}
>  }
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 533b584c2806..aac1fc1b2a7c 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -714,16 +739,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
>  			success = 1;
>  	}
>  
> -	/*
> -	 * Check for case where an explicit ABORT_TASK has been received
> -	 * and transport_wait_for_tasks() will be waiting for completion..
> -	 */
> -	if (cmd->transport_state & CMD_T_ABORTED ||
> -	    cmd->transport_state & CMD_T_STOP) {
> -		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -		complete_all(&cmd->t_transport_stop_comp);
> -		return;
> -	} else if (!success) {

This can't be removed because __transport_wait_for_tasks() needs to
catch the CMD_T_STOP or CMD_T_ABORTED quiesce here.

Otherwise for !success, the target_completes_failure_work() executes and
actually queues the response, but only CMD_T_STOP (but not
CMD_T_ABORTED !) is checked transport_cmd_check_stop_to_fabric() after
the response has been dispatched.

> +	if (!success) {
>  		INIT_WORK(&cmd->work, target_complete_failure_work);
>  	} else {
>  		INIT_WORK(&cmd->work, target_complete_ok_work);
> @@ -733,6 +749,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
>  	cmd->transport_state |= (CMD_T_COMPLETE | CMD_T_ACTIVE);
>  	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
>  
> +queue_work:
>  	if (cmd->se_cmd_flags & SCF_USE_CPUID)
>  		queue_work_on(cmd->cpuid, target_completion_wq, &cmd->work);
>  	else
> @@ -1215,6 +1232,7 @@ void transport_init_se_cmd(
>  	INIT_LIST_HEAD(&cmd->state_list);
>  	init_completion(&cmd->t_transport_stop_comp);
>  	init_completion(&cmd->cmd_wait_comp);
> +	init_completion(&cmd->complete);
>  	spin_lock_init(&cmd->t_state_lock);
>  	kref_init(&cmd->cmd_kref);
>  	atomic_set(&cmd->tgt_ref, 1);
> @@ -1886,6 +1904,7 @@ void target_execute_cmd(struct se_cmd *cmd)
>  	spin_lock_irq(&cmd->t_state_lock);
>  	if (__transport_check_aborted_status(cmd, 1)) {
>  		spin_unlock_irq(&cmd->t_state_lock);
> +		transport_handle_abort(cmd);
>  		return;
>  	}

__transport_check_aborted_status() must honor delayed TAS, and allow the
fabric to complete WRITE transfer before proceeding with abort.

There is assumptions about this all over driver code.

>  	if (cmd->transport_state & CMD_T_STOP) {
> @@ -2494,15 +2513,11 @@ static void target_wait_free_cmd(struct se_cmd *cmd, bool *aborted, bool *tas)
>  
>  int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  {
> -	int ret = 0;
>  	bool aborted = false, tas = false;
>  
>  	if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) {
>  		if (wait_for_tasks && (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
>  			target_wait_free_cmd(cmd, &aborted, &tas);
> -
> -		if (!aborted || tas)
> -			ret = transport_put_cmd(cmd);
>  	} else {
>  		if (wait_for_tasks)
>  			target_wait_free_cmd(cmd, &aborted, &tas);
> @@ -2516,23 +2531,12 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  
>  		if (cmd->se_lun)
>  			transport_lun_remove_cmd(cmd);
> -
> -		if (!aborted || tas)
> -			ret = transport_put_cmd(cmd);
>  	}
> -	/*
> -	 * If the task has been internally aborted due to TMR ABORT_TASK
> -	 * or LUN_RESET, target_core_tmr.c is responsible for performing
> -	 * the remaining calls to target_put_sess_cmd(), and not the
> -	 * callers of this function.
> -	 */
>  	if (aborted) {
>  		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
>  		wait_for_completion(&cmd->cmd_wait_comp);
> -		transport_put_cmd(cmd);
> -		ret = 1;
>  	}
> -	return ret;
> +	return transport_put_cmd(cmd);

As mentioned in patch #14, there is no way this can possibly work.

The wait_for_completion(&cmd->cmd_wait_comp) is woken up when
transport_put_cmd() -> kref_put() of se_cmd->cmd_kref reaches zero, so
there is no way the wait_for_completion during CMD_T_ABORTED can ever
make forward progress.

>  }
>  EXPORT_SYMBOL(transport_generic_free_cmd);
>  
> @@ -2644,6 +2648,43 @@ int target_put_sess_cmd(struct se_cmd *se_cmd)
>  }
>  EXPORT_SYMBOL(target_put_sess_cmd);
>  
> +static const char *data_dir_name(enum dma_data_direction d)
> +{
> +	switch (d) {
> +	case DMA_BIDIRECTIONAL:	return "BIDI";
> +	case DMA_TO_DEVICE:	return "WRITE";
> +	case DMA_FROM_DEVICE:	return "READ";
> +	case DMA_NONE:		return "NONE";
> +	default:		return "(?)";
> +	}
> +}
> +
> +static const char *cmd_state_name(enum transport_state_table t)
> +{
> +	switch (t) {
> +	case TRANSPORT_NO_STATE:	return "NO_STATE";
> +	case TRANSPORT_NEW_CMD:		return "NEW_CMD";
> +	case TRANSPORT_WRITE_PENDING:	return "WRITE_PENDING";
> +	case TRANSPORT_PROCESSING:	return "PROCESSING";
> +	case TRANSPORT_COMPLETE:	return "COMPLETE";
> +	case TRANSPORT_ISTATE_PROCESSING:
> +					return "ISTATE_PROCESSING";
> +	case TRANSPORT_COMPLETE_QF_WP:	return "COMPLETE_QF_WP";
> +	case TRANSPORT_COMPLETE_QF_OK:	return "COMPLETE_QF_OK";
> +	default:			return "?";
> +	}
> +}
> +
> +void target_show_cmd(const char *ctxt, struct se_cmd *cmd)
> +{
> +	pr_debug("%s: still waiting for %s with tag %#llx; dir %s i_state %d t_tate %s len %d tgt_ref %d refcnt %d\n",
> +		 ctxt, cmd->se_cmd_flags & SCF_SCSI_TMR_CDB ? "tmf" : "cmd",
> +		 cmd->tag, data_dir_name(cmd->data_direction),
> +		 cmd->se_tfo->get_cmd_state(cmd), cmd_state_name(cmd->t_state),
> +		 cmd->data_length, atomic_read(&cmd->tgt_ref),
> +		 atomic_read(&cmd->cmd_kref.refcount));
> +}
> +

Actually, this should be a standalone patch that is used by
__transport_wait_for_tasks() during CMD_T_STOP and waiting for
se_cmd->t_transport_stop_comp to complete.

I'd be happy to pick that up for existing code.

> @@ -3043,47 +3075,6 @@ int transport_check_aborted_status(struct se_cmd *cmd, int send_status)
>  }
>  EXPORT_SYMBOL(transport_check_aborted_status);
>  
> -void transport_send_task_abort(struct se_cmd *cmd)
> -{
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&cmd->t_state_lock, flags);
> -	if (cmd->se_cmd_flags & (SCF_SENT_CHECK_CONDITION)) {
> -		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -		return;
> -	}
> -	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -
> -	/*
> -	 * If there are still expected incoming fabric WRITEs, we wait
> -	 * until until they have completed before sending a TASK_ABORTED
> -	 * response.  This response with TASK_ABORTED status will be
> -	 * queued back to fabric module by transport_check_aborted_status().
> -	 */
> -	if (cmd->data_direction == DMA_TO_DEVICE) {
> -		if (cmd->se_tfo->write_pending_status(cmd) != 0) {
> -			spin_lock_irqsave(&cmd->t_state_lock, flags);
> -			if (cmd->se_cmd_flags & SCF_SEND_DELAYED_TAS) {
> -				spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -				goto send_abort;
> -			}
> -			cmd->se_cmd_flags |= SCF_SEND_DELAYED_TAS;
> -			spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -			return;
> -		}
> -	}

Getting rid of the ->write_pending_status() here and DELAYED_TAS
completely breaks WRITEs during CMD_T_ABORTED for iscsi-target and
qla2xxx, when the outstanding WRITE transfer _must be_ allowed to
complete before proceeding with abort.

See comments in patch #23 why this can't be removed.

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html