Re: [PATCH 11/21] target: Make ABORT and LUN RESET handling synchronous

"Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx> · Sun, 15 Nov 2015 21:01:15 -0800

On Thu, 2015-10-22 at 15:58 -0700, Bart Van Assche wrote:
> Instead of invoking target driver callback functions from the
> context that handles an abort or LUN RESET task management function,
> only set the abort flag from that context and perform the actual
> abort handling from the context of the regular command processing
> flow. This approach has the following advantages:
> - The task management function code becomes much easier to read and
>   to verify since the number of potential race conditions against
>   the command processing flow is strongly reduced.
> - It is no longer needed to store the command state into the command
>   itself since that information is no longer needed from the context
>   where a task management function is processed.
> 
> Notes:
> - The target_remove_from_state_list() call in transport_cmd_check_stop()
>   has been moved up such that it is called without t_state_lock held.
>   This is necessary to avoid triggering lock inversion against
>   core_tmr_drain_state_list(), which now locks execute_task_lock and
>   t_state_lock in that order.
> - With this patch applied the CMD_T_ABORTED flag is checked at two points
>   by the target core: just before local execution of a command starts
>   (see also target_execute_cmd()) and also just before the response is
>   sent (see also target_complete_cmd()).
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx>
> Reviewed-by: Andy Grover <agrover@xxxxxxxxxx>
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Hannes Reinecke <hare@xxxxxxx>
> Cc: Sagi Grimberg <sagig@xxxxxxxxxxxx>
> ---
>  drivers/target/target_core_internal.h  |   3 -
>  drivers/target/target_core_tmr.c       |  83 +++++++++---------
>  drivers/target/target_core_transport.c | 155 ++++++++++-----------------------
>  include/target/target_core_base.h      |   5 +-
>  4 files changed, 88 insertions(+), 158 deletions(-)

<SNIP>

> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
> index fcdcb11..df20630 100644
> --- a/drivers/target/target_core_tmr.c
> +++ b/drivers/target/target_core_tmr.c
> @@ -148,16 +131,13 @@ void core_tmr_abort_task(
>  			goto out;
>  		}
>  		se_cmd->transport_state |= CMD_T_ABORTED;
> +		se_cmd->send_abort_response = false;
>  		spin_unlock(&se_cmd->t_state_lock);
> -
> -		list_del_init(&se_cmd->se_cmd_list);
>  		spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
>  
> -		cancel_work_sync(&se_cmd->work);
> -		transport_wait_for_tasks(se_cmd);
> +		wait_for_completion(&se_cmd->finished);
>  
>  		target_put_sess_cmd(se_cmd);
> -		transport_cmd_finish_abort(se_cmd, true);
>  
>  		printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for"
>  				" ref_tag: %llu\n", ref_tag);
> @@ -217,7 +197,13 @@ static void core_tmr_drain_tmr_list(
>  		}
>  		spin_unlock(&cmd->t_state_lock);
>  
> -		list_move_tail(&tmr_p->tmr_list, &drain_tmr_list);
> +		/*
> +		 * Don't try to move a command without a zero refount,
> +		 * it has been completed already as will be removed from
> +		 * sess_cmd_list ASAP; no need to abort it.
> +		 */
> +		if (kref_get_unless_zero(&cmd->cmd_kref))
> +			list_move_tail(&tmr_p->tmr_list, &drain_tmr_list);
>  	}
>  	spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
>  
> @@ -230,10 +216,29 @@ static void core_tmr_drain_tmr_list(
>  			(preempt_and_abort_list) ? "Preempt" : "", tmr_p,
>  			tmr_p->function, tmr_p->response, cmd->t_state);
>  
> -		transport_cmd_finish_abort(cmd, 1);
> +		wait_for_completion(&cmd->finished);
> +		target_put_sess_cmd(cmd);
>  	}
>  }
>  

I like the reference counting changes to address the possible cmd_kref =
-1 bug when aborting other TMRs during LUN_RESET, but this should not be
mixed in with a new single use cmd->finished struct completion that is
supposed to address all types of active I/O shutdown scenarios.

See below.

> +/**
> + * core_tmr_drain_state_list() - Abort SCSI commands associated with a device.
> + *
> + * @dev:       Device for which to abort outstanding SCSI commands.
> + * @prout_cmd: Pointer to the SCSI PREEMPT AND ABORT if this function is called
> + *             to realize the PREEMPT AND ABORT functionality.
> + * @tmr_nacl:  Initiator port through which the LUN RESET has been received.
> + * @tas:       Task Aborted Status (TAS) bit from the SCSI control mode page.
> + *             A quote from SPC-4, paragraph "7.5.10 Control mode page":
> + *             "A task aborted status (TAS) bit set to zero specifies that
> + *             aborted commands shall be terminated by the device server
> + *             without any response to the application client. A TAS bit set
> + *             to one specifies that commands aborted by the actions of an I_T
> + *             nexus other than the I_T nexus on which the command was
> + *             received shall be completed with TASK ABORTED status."
> + * @preempt_and_abort_list: For the PREEMPT AND ABORT functionality, a list
> + *             with registrations that will be preempted.
> + */
>  static void core_tmr_drain_state_list(
>  	struct se_device *dev,
>  	struct se_cmd *prout_cmd,
> @@ -282,8 +287,19 @@ static void core_tmr_drain_state_list(
>  		if (prout_cmd == cmd)
>  			continue;
>  
> +		/*
> +		 * Don't try to move a command without a zero refount,
> +		 * it has been completed already as will be removed from
> +		 * sess_cmd_list ASAP; no need to abort it.
> +		 */
> +		if (!kref_get_unless_zero(&cmd->cmd_kref))
> +			continue;
> +
>  		list_move_tail(&cmd->state_list, &drain_task_list);
>  		cmd->state_active = false;
> +		cmd->transport_state |= CMD_T_ABORTED;
> +		if (tmr_nacl && tmr_nacl != cmd->se_sess->se_node_acl && tas)
> +			cmd->send_abort_response = true;
>  	}
>  	spin_unlock_irqrestore(&dev->execute_task_lock, flags);
>  
> @@ -306,23 +322,8 @@ static void core_tmr_drain_state_list(
>  			(cmd->transport_state & CMD_T_STOP) != 0,
>  			(cmd->transport_state & CMD_T_SENT) != 0);
>  
> -		/*
> -		 * If the command may be queued onto a workqueue cancel it now.
> -		 *
> -		 * This is equivalent to removal from the execute queue in the
> -		 * loop above, but we do it down here given that
> -		 * cancel_work_sync may block.
> -		 */
> -		if (cmd->t_state == TRANSPORT_COMPLETE)
> -			cancel_work_sync(&cmd->work);
> -
> -		spin_lock_irqsave(&cmd->t_state_lock, flags);
> -		target_stop_cmd(cmd, &flags);
> -
> -		cmd->transport_state |= CMD_T_ABORTED;
> -		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -
> -		core_tmr_handle_tas_abort(tmr_nacl, cmd, tas);
> +		wait_for_completion(&cmd->finished);
> +		target_put_sess_cmd(cmd);
>  	}
>  }

Ditto on the reference counting here to address a possible cmd_kref = -1
scenario, but the same incorrect assumptions about cmd->finished usage
across multiple active I/O shutdown cases exist here.

So what I'd like to see is addressing the existing TMR and state list
drain reference counting bugs on their own, separate from replacing all
existing methods beyond TMR for active I/O shutdown handling.

>  
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 4d7b1bc..c90a672 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -562,10 +562,6 @@ static int transport_cmd_check_stop(struct se_cmd *cmd, bool remove_from_lists,
>  {
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&cmd->t_state_lock, flags);
> -	if (write_pending)
> -		cmd->t_state = TRANSPORT_WRITE_PENDING;
> -
>  	if (remove_from_lists) {
>  		target_remove_from_state_list(cmd);
>  
> @@ -575,6 +571,10 @@ static int transport_cmd_check_stop(struct se_cmd *cmd, bool remove_from_lists,
>  		cmd->se_lun = NULL;
>  	}
>  
> +	spin_lock_irqsave(&cmd->t_state_lock, flags);
> +	if (write_pending)
> +		cmd->t_state = TRANSPORT_WRITE_PENDING;
> +
>  	/*
>  	 * Determine if frontend context caller is requesting the stopping of
>  	 * this command for frontend exceptions.
> @@ -585,7 +585,6 @@ static int transport_cmd_check_stop(struct se_cmd *cmd, bool remove_from_lists,
>  
>  		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
>  
> -		complete_all(&cmd->t_transport_stop_comp);
>  		return 1;
>  	}
>  

This breaks active I/O session shutdown when TMR LUN_RESET happens
at the same time.

>  static void target_complete_failure_work(struct work_struct *work)
>  {
>  	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
> @@ -674,14 +656,47 @@ static unsigned char *transport_get_sense_buffer(struct se_cmd *cmd)
>  	return cmd->sense_buffer;
>  }
>  
> +static void transport_handle_abort(struct se_cmd *cmd)
> +{
> +	transport_lun_remove_cmd(cmd);
> +
> +	if (cmd->send_abort_response) {
> +		cmd->scsi_status = SAM_STAT_TASK_ABORTED;
> +		pr_debug("Setting SAM_STAT_TASK_ABORTED status for CDB: 0x%02x, ITT: 0x%08llx\n",
> +			 cmd->t_task_cdb[0], cmd->tag);
> +		trace_target_cmd_complete(cmd);
> +		cmd->se_tfo->queue_status(cmd);
> +		transport_cmd_check_stop_to_fabric(cmd);
> +	} else {
> +		/*
> +		 * Allow the fabric driver to unmap any resources before
> +		 * releasing the descriptor via TFO->release_cmd()
> +		 */
> +		cmd->se_tfo->aborted_task(cmd);
> +		/*
> +		 * To do: establish a unit attention condition on the I_T
> +		 * nexus associated with cmd. See also the paragraph "Aborting
> +		 * commands" in SAM.
> +		 */
> +		if (transport_cmd_check_stop_to_fabric(cmd) == 0)
> +			transport_put_cmd(cmd);
> +	}
> +}
> +
>  void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
>  {
>  	struct se_device *dev = cmd->se_dev;
>  	int success = scsi_status == GOOD;
>  	unsigned long flags;
>  
> -	cmd->scsi_status = scsi_status;
> +	if (cmd->transport_state & CMD_T_ABORTED) {
> +		transport_handle_abort(cmd);
> +		return;
> +	} else if (cmd->transport_state & (CMD_T_REQUEST_STOP | CMD_T_STOP)) {
> +		return;
> +	}
>  

Backend drivers like IBLOCK can call bio_endio -> target_complete_cmd()
from interrupt context.  

Invoking transport_handle_abort() -> TFO->queue_status() with any type
of real struct block_device based backend storage will immediately
explode.  That is why there is a queue_work() right below this code to
invoke target_complete_ok_work() from process context.

Also, checking ->transport_state without ->t_state_lock in completion
path is incorrect.

> +	cmd->scsi_status = scsi_status;
>  
>  	spin_lock_irqsave(&cmd->t_state_lock, flags);
>  	cmd->transport_state &= ~CMD_T_BUSY;
> @@ -694,25 +709,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
>  			success = 1;
>  	}
>  
> -	/*
> -	 * See if we are waiting to complete for an exception condition.
> -	 */
> -	if (cmd->transport_state & CMD_T_REQUEST_STOP) {
> -		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -		complete(&cmd->task_stop_comp);
> -		return;
> -	}
> -
> -	/*
> -	 * Check for case where an explicit ABORT_TASK has been received
> -	 * and transport_wait_for_tasks() will be waiting for completion..
> -	 */
> -	if (cmd->transport_state & CMD_T_ABORTED &&
> -	    cmd->transport_state & CMD_T_STOP) {
> -		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -		complete_all(&cmd->t_transport_stop_comp);
> -		return;

This breaks active I/O shutdown.

What happens when active I/O session shutdown, se_lun shutdown via
configfs, and TMR ABORT_TASK happen at the same time..?

> @@ -1633,33 +1629,6 @@ int target_submit_tmr(struct se_cmd *se_cmd, struct se_session *se_sess,
>  EXPORT_SYMBOL(target_submit_tmr);
>  
>  /*
> - * If the cmd is active, request it to be stopped and sleep until it
> - * has completed.
> - */
> -bool target_stop_cmd(struct se_cmd *cmd, unsigned long *flags)
> -	__releases(&cmd->t_state_lock)
> -	__acquires(&cmd->t_state_lock)
> -{
> -	bool was_active = false;
> -
> -	if (cmd->transport_state & CMD_T_BUSY) {
> -		cmd->transport_state |= CMD_T_REQUEST_STOP;
> -		spin_unlock_irqrestore(&cmd->t_state_lock, *flags);
> -
> -		pr_debug("cmd %p waiting to complete\n", cmd);
> -		wait_for_completion(&cmd->task_stop_comp);
> -		pr_debug("cmd %p stopped successfully\n", cmd);
> -
> -		spin_lock_irqsave(&cmd->t_state_lock, *flags);
> -		cmd->transport_state &= ~CMD_T_REQUEST_STOP;
> -		cmd->transport_state &= ~CMD_T_BUSY;
> -		was_active = true;
> -	}
> -
> -	return was_active;
> -}
> -

The reason se_cmd->task_stop_comp and se_cmd->t_transport_stop_comp
exist separately is so the former can handle TMR I/O shutdown, while
latter handles active I/O session shutdown.

> @@ -1869,16 +1838,13 @@ void target_execute_cmd(struct se_cmd *cmd)
>  	 * Determine if frontend context caller is requesting the stopping of
>  	 * this command for frontend exceptions.
>  	 */
> -	spin_lock_irq(&cmd->t_state_lock);
>  	if (cmd->transport_state & CMD_T_STOP) {
>  		pr_debug("%s:%d CMD_T_STOP for ITT: 0x%08llx\n",
>  			__func__, __LINE__, cmd->tag);
> -
> -		spin_unlock_irq(&cmd->t_state_lock);
> -		complete_all(&cmd->t_transport_stop_comp);
>  		return;
>  	}
>  

Also breaks active I/O session shutdown.

What happens when active I/O session shutdown, se_lun shutdown via
configfs, and TMR ABORT_TASK happen at the same time..?

> +	spin_lock_irq(&cmd->t_state_lock);
>  	cmd->t_state = TRANSPORT_PROCESSING;
>  	cmd->transport_state |= CMD_T_ACTIVE|CMD_T_BUSY|CMD_T_SENT;
>  	spin_unlock_irq(&cmd->t_state_lock);
> @@ -2228,6 +2194,8 @@ static int transport_release_cmd(struct se_cmd *cmd)
>  {
>  	BUG_ON(!cmd->se_tfo);
>  
> +	complete(&cmd->finished);
> +

This means every fast-path completion is doing a unnecessary complete().

No thanks.

>  	if (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)
>  		core_tmr_release_req(cmd->se_tmr_req);
>  	if (cmd->t_task_cdb != cmd->__t_task_cdb)
> @@ -2651,7 +2619,7 @@ bool transport_wait_for_tasks(struct se_cmd *cmd)
>  
>  	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
>  
> -	wait_for_completion(&cmd->t_transport_stop_comp);
> +	wait_for_completion(&cmd->finished);
>  

Same here.

>  	spin_lock_irqsave(&cmd->t_state_lock, flags);
>  	cmd->transport_state &= ~(CMD_T_ACTIVE | CMD_T_STOP);
> @@ -2867,41 +2835,6 @@ int transport_check_aborted_status(struct se_cmd *cmd, int send_status)
>  }
>  EXPORT_SYMBOL(transport_check_aborted_status);
>  
> -void transport_send_task_abort(struct se_cmd *cmd)
> -{
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&cmd->t_state_lock, flags);
> -	if (cmd->se_cmd_flags & (SCF_SENT_CHECK_CONDITION)) {
> -		spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -		return;
> -	}
> -	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
> -
> -	/*
> -	 * If there are still expected incoming fabric WRITEs, we wait
> -	 * until until they have completed before sending a TASK_ABORTED
> -	 * response.  This response with TASK_ABORTED status will be
> -	 * queued back to fabric module by transport_check_aborted_status().
> -	 */
> -	if (cmd->data_direction == DMA_TO_DEVICE) {
> -		if (cmd->se_tfo->write_pending_status(cmd) != 0) {
> -			cmd->transport_state |= CMD_T_ABORTED;
> -			cmd->se_cmd_flags |= SCF_SEND_DELAYED_TAS;
> -			return;
> -		}
> -	}
> -	cmd->scsi_status = SAM_STAT_TASK_ABORTED;
> -
> -	transport_lun_remove_cmd(cmd);
> -
> -	pr_debug("Setting SAM_STAT_TASK_ABORTED status for CDB: 0x%02x, ITT: 0x%08llx\n",
> -		 cmd->t_task_cdb[0], cmd->tag);
> -
> -	trace_target_cmd_complete(cmd);
> -	cmd->se_tfo->queue_status(cmd);
> -}
> -

To summarize:

The se_cmd->cmd_kref reference count bugs in core_tmr_drain_tmr_list() +
core_tmr_drain_state_list() need to be addressed.  They should be in a
standalone patch, separate from the other changes because they will
eventually need to goto stable.

Trying to move all shutdown cases to a single new per se_cmd struct
completion does not work as-is, because the scenario when active I/O
session shutdown and TMR LUN_RESET happen at the same time is ignored by
se_cmd->finished.

These two events can happen at the same time mixed in with explicit
configfs group shutdown, and while trying to implement a single struct
completion might seem easy, these changes break existing logic while
attempting to improve TMR.

That said, I'm not totally opposed to the idea of making TMR handling
synchronous, but it needs to happen in a separate patch from other
points mentioned here.

Also, please consider using a HW storage backend with IBLOCK with
iscsi-target or tcm_qla2xxx exports when testing these types of changes.
Nobody cares about drivers/target/tcm_fc/ in production at this point,
and using it as the primary vehicle to drive larger changes as
complicated as these is not especially helpful.

Thanks,

--nab

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html