On Wed, 2017-02-08 at 14:24 -0800, Bart Van Assche wrote: > Instead of invoking target driver callback functions from the > context that handles an abort or LUN RESET task management function, > only set the abort flag from that context and perform the actual > abort handling from the context of the regular command processing > flow. This approach has the following advantages: > - The task management function code becomes much easier to read and > to verify since the number of potential race conditions against > the command processing flow is strongly reduced. > - It is no longer needed to store the command state into the command > itself since that information is no longer needed from the context > where a task management function is processed. > > Notes: > - With this patch applied the CMD_T_ABORTED flag is checked at two points > by the target core: just before local execution of a command starts > (see also target_execute_cmd()) and also just before the response is > sent (see also target_complete_cmd()). > - A new reference count is introduced that tracks the number of > references held by the target driver, namely se_cmd.tgt_ref. > To reiterate the way existing code functions for first and second order handling, here is the text from the earlier -v2 review to keep the context fresh. The quiesce of in-flight se_cmd can happen in three ways: - Early submission in target_execute_cmd() to catch CMD_T_STOP before se_cmd is submitted to the backend, or - waiting for outstanding backend I/O to complete via target_complete_cmd() to catch CMD_T_STOP (typical case when a backend device is no responding), or - waiting for transport_cmd_check_stop() hand-off of se_cmd from target-core to fabric driver response to catch CMD_T_STOP. This quiesce can happen for se_cmd with ABORT_TASK in core_tmr_abort_task(), with LUN_RESET it happen for TMRs in core_tmr_drain_tmr_list() and se_cmd descriptors in core_tmr_drain_state_list() as part of se_device->state_list. Once CMD_T_STOP is cleared, target_core_tmr.c will drop both the outstanding references to se_cmd->cmd_kref, as well as it's own reference obtained by __target_check_io_state(). The second order issue, which is what complicates things with the existing code, is that any of the above first order cases can occur and block waiting for CMD_T_STOP to be cleared at the _same time_ while the associated fabric driver se_session is being shutdown, and sets CMD_T_FABRIC_STOP on all the se_session's se_cmd descriptors. The session shutdown can happen explicitly by the fabric driver, or by a session reinstatement event (with iscsi-target for example) when a host doesn't get a TMR response due to backend I/O not being completed for an extended amount of time. The dynamic interaction of the first and second order issues here is what has taken us a long time to get right in upstream code, and which has been stable for end-users the past year since the advent of commit 0f4a94316. So with that additional context about how/why things work as they do today in mainline code, my comments are inline below. > Signed-off-by: Bart Van Assche <bart.vanassche@xxxxxxxxxxx> > Cc: Hannes Reinecke <hare@xxxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxx> > Cc: Andy Grover <agrover@xxxxxxxxxx> > Cc: David Disseldorp <ddiss@xxxxxxx> > --- > diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c > index 161c8b8482db..980d94277c8d 100644 > --- a/drivers/target/target_core_tmr.c > +++ b/drivers/target/target_core_tmr.c > @@ -192,13 +173,13 @@ void core_tmr_abort_task( > continue; > } > > + se_cmd->send_abort_response = false; > list_del_init(&se_cmd->se_cmd_list); > spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); > > - cancel_work_sync(&se_cmd->work); > - transport_wait_for_tasks(se_cmd); > + while (!wait_for_completion_timeout(&se_cmd->complete, 180 * HZ)) > + target_show_cmd(__func__, se_cmd); > This is still fundamentally wrong, because it allows se_cmd to be dispatched into the backend driver, and only just waits for completion from the backend after the fact. Keep in mind existing transport_wait_for_tasks() code sets CMD_T_STOP and blocks on se_cmd->t_transport_stop_cmd(), so target_execute_cmd() catches CMD_T_STOP and prevents the backend se_cmd->execute_cmd() execution from actually occurring. Whatever changes you are proposing needs to maintain this logic. > - transport_cmd_finish_abort(se_cmd, true); > __target_put_sess_cmd(se_cmd); > > printk("ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for" > @@ -295,14 +276,30 @@ static void core_tmr_drain_tmr_list( > (preempt_and_abort_list) ? "Preempt" : "", tmr_p, > tmr_p->function, tmr_p->response, cmd->t_state); > > - cancel_work_sync(&cmd->work); > - transport_wait_for_tasks(cmd); > - > - transport_cmd_finish_abort(cmd, 1); > + while (!wait_for_completion_timeout(&cmd->complete, 180 * HZ)) > + target_show_cmd(__func__, cmd); > __target_put_sess_cmd(cmd); > } > } Likewise here, this doesn't actually prevent TMR execution, it just waits for it to complete after the fact. > static void core_tmr_drain_state_list( > struct se_device *dev, > struct se_cmd *prout_cmd, > @@ -311,6 +308,7 @@ static void core_tmr_drain_state_list( > struct list_head *preempt_and_abort_list) > { > LIST_HEAD(drain_task_list); > + struct se_node_acl *tmr_nacl = tmr_sess ? tmr_sess->se_node_acl : NULL; > struct se_session *sess; > struct se_cmd *cmd, *next; > unsigned long flags; > @@ -365,6 +363,8 @@ static void core_tmr_drain_state_list( > > list_move_tail(&cmd->state_list, &drain_task_list); > cmd->state_active = false; > + if (tmr_nacl && tmr_nacl != cmd->se_sess->se_node_acl && tas) > + cmd->send_abort_response = true; > } > spin_unlock_irqrestore(&dev->execute_task_lock, flags); > > @@ -387,17 +387,8 @@ static void core_tmr_drain_state_list( > (cmd->transport_state & CMD_T_STOP) != 0, > (cmd->transport_state & CMD_T_SENT) != 0); > > - /* > - * If the command may be queued onto a workqueue cancel it now. > - * > - * This is equivalent to removal from the execute queue in the > - * loop above, but we do it down here given that > - * cancel_work_sync may block. > - */ > - cancel_work_sync(&cmd->work); > - transport_wait_for_tasks(cmd); > - > - core_tmr_handle_tas_abort(cmd, tas); > + while (!wait_for_completion_timeout(&cmd->complete, 180 * HZ)) > + target_show_cmd(__func__, cmd); > __target_put_sess_cmd(cmd); And here again too. You can't just dispatch aborted commands to the backend and wait until they complete after the fact. If the command is requesting to being CMD_T_ABORTED, target_execute_cmd() must catch the command's CMD_T_STOP before it's dispatched into the backend. What if the backend is not responding for a long time in the first place that caused the host to issue LUN_RESET..? Why would it make sense to keep flooding the device with new commands..? > } > } > diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c > index 533b584c2806..aac1fc1b2a7c 100644 > --- a/drivers/target/target_core_transport.c > +++ b/drivers/target/target_core_transport.c > @@ -714,16 +739,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status) > success = 1; > } > > - /* > - * Check for case where an explicit ABORT_TASK has been received > - * and transport_wait_for_tasks() will be waiting for completion.. > - */ > - if (cmd->transport_state & CMD_T_ABORTED || > - cmd->transport_state & CMD_T_STOP) { > - spin_unlock_irqrestore(&cmd->t_state_lock, flags); > - complete_all(&cmd->t_transport_stop_comp); > - return; > - } else if (!success) { This can't be removed because __transport_wait_for_tasks() needs to catch the CMD_T_STOP or CMD_T_ABORTED quiesce here. Otherwise for !success, the target_completes_failure_work() executes and actually queues the response, but only CMD_T_STOP (but not CMD_T_ABORTED !) is checked transport_cmd_check_stop_to_fabric() after the response has been dispatched. > + if (!success) { > INIT_WORK(&cmd->work, target_complete_failure_work); > } else { > INIT_WORK(&cmd->work, target_complete_ok_work); > @@ -733,6 +749,7 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status) > cmd->transport_state |= (CMD_T_COMPLETE | CMD_T_ACTIVE); > spin_unlock_irqrestore(&cmd->t_state_lock, flags); > > +queue_work: > if (cmd->se_cmd_flags & SCF_USE_CPUID) > queue_work_on(cmd->cpuid, target_completion_wq, &cmd->work); > else > @@ -1215,6 +1232,7 @@ void transport_init_se_cmd( > INIT_LIST_HEAD(&cmd->state_list); > init_completion(&cmd->t_transport_stop_comp); > init_completion(&cmd->cmd_wait_comp); > + init_completion(&cmd->complete); > spin_lock_init(&cmd->t_state_lock); > kref_init(&cmd->cmd_kref); > atomic_set(&cmd->tgt_ref, 1); > @@ -1886,6 +1904,7 @@ void target_execute_cmd(struct se_cmd *cmd) > spin_lock_irq(&cmd->t_state_lock); > if (__transport_check_aborted_status(cmd, 1)) { > spin_unlock_irq(&cmd->t_state_lock); > + transport_handle_abort(cmd); > return; > } __transport_check_aborted_status() must honor delayed TAS, and allow the fabric to complete WRITE transfer before proceeding with abort. There is assumptions about this all over driver code. > if (cmd->transport_state & CMD_T_STOP) { > @@ -2494,15 +2513,11 @@ static void target_wait_free_cmd(struct se_cmd *cmd, bool *aborted, bool *tas) > > int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks) > { > - int ret = 0; > bool aborted = false, tas = false; > > if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) { > if (wait_for_tasks && (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)) > target_wait_free_cmd(cmd, &aborted, &tas); > - > - if (!aborted || tas) > - ret = transport_put_cmd(cmd); > } else { > if (wait_for_tasks) > target_wait_free_cmd(cmd, &aborted, &tas); > @@ -2516,23 +2531,12 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks) > > if (cmd->se_lun) > transport_lun_remove_cmd(cmd); > - > - if (!aborted || tas) > - ret = transport_put_cmd(cmd); > } > - /* > - * If the task has been internally aborted due to TMR ABORT_TASK > - * or LUN_RESET, target_core_tmr.c is responsible for performing > - * the remaining calls to target_put_sess_cmd(), and not the > - * callers of this function. > - */ > if (aborted) { > pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag); > wait_for_completion(&cmd->cmd_wait_comp); > - transport_put_cmd(cmd); > - ret = 1; > } > - return ret; > + return transport_put_cmd(cmd); As mentioned in patch #14, there is no way this can possibly work. The wait_for_completion(&cmd->cmd_wait_comp) is woken up when transport_put_cmd() -> kref_put() of se_cmd->cmd_kref reaches zero, so there is no way the wait_for_completion during CMD_T_ABORTED can ever make forward progress. > } > EXPORT_SYMBOL(transport_generic_free_cmd); > > @@ -2644,6 +2648,43 @@ int target_put_sess_cmd(struct se_cmd *se_cmd) > } > EXPORT_SYMBOL(target_put_sess_cmd); > > +static const char *data_dir_name(enum dma_data_direction d) > +{ > + switch (d) { > + case DMA_BIDIRECTIONAL: return "BIDI"; > + case DMA_TO_DEVICE: return "WRITE"; > + case DMA_FROM_DEVICE: return "READ"; > + case DMA_NONE: return "NONE"; > + default: return "(?)"; > + } > +} > + > +static const char *cmd_state_name(enum transport_state_table t) > +{ > + switch (t) { > + case TRANSPORT_NO_STATE: return "NO_STATE"; > + case TRANSPORT_NEW_CMD: return "NEW_CMD"; > + case TRANSPORT_WRITE_PENDING: return "WRITE_PENDING"; > + case TRANSPORT_PROCESSING: return "PROCESSING"; > + case TRANSPORT_COMPLETE: return "COMPLETE"; > + case TRANSPORT_ISTATE_PROCESSING: > + return "ISTATE_PROCESSING"; > + case TRANSPORT_COMPLETE_QF_WP: return "COMPLETE_QF_WP"; > + case TRANSPORT_COMPLETE_QF_OK: return "COMPLETE_QF_OK"; > + default: return "?"; > + } > +} > + > +void target_show_cmd(const char *ctxt, struct se_cmd *cmd) > +{ > + pr_debug("%s: still waiting for %s with tag %#llx; dir %s i_state %d t_tate %s len %d tgt_ref %d refcnt %d\n", > + ctxt, cmd->se_cmd_flags & SCF_SCSI_TMR_CDB ? "tmf" : "cmd", > + cmd->tag, data_dir_name(cmd->data_direction), > + cmd->se_tfo->get_cmd_state(cmd), cmd_state_name(cmd->t_state), > + cmd->data_length, atomic_read(&cmd->tgt_ref), > + atomic_read(&cmd->cmd_kref.refcount)); > +} > + Actually, this should be a standalone patch that is used by __transport_wait_for_tasks() during CMD_T_STOP and waiting for se_cmd->t_transport_stop_comp to complete. I'd be happy to pick that up for existing code. > @@ -3043,47 +3075,6 @@ int transport_check_aborted_status(struct se_cmd *cmd, int send_status) > } > EXPORT_SYMBOL(transport_check_aborted_status); > > -void transport_send_task_abort(struct se_cmd *cmd) > -{ > - unsigned long flags; > - > - spin_lock_irqsave(&cmd->t_state_lock, flags); > - if (cmd->se_cmd_flags & (SCF_SENT_CHECK_CONDITION)) { > - spin_unlock_irqrestore(&cmd->t_state_lock, flags); > - return; > - } > - spin_unlock_irqrestore(&cmd->t_state_lock, flags); > - > - /* > - * If there are still expected incoming fabric WRITEs, we wait > - * until until they have completed before sending a TASK_ABORTED > - * response. This response with TASK_ABORTED status will be > - * queued back to fabric module by transport_check_aborted_status(). > - */ > - if (cmd->data_direction == DMA_TO_DEVICE) { > - if (cmd->se_tfo->write_pending_status(cmd) != 0) { > - spin_lock_irqsave(&cmd->t_state_lock, flags); > - if (cmd->se_cmd_flags & SCF_SEND_DELAYED_TAS) { > - spin_unlock_irqrestore(&cmd->t_state_lock, flags); > - goto send_abort; > - } > - cmd->se_cmd_flags |= SCF_SEND_DELAYED_TAS; > - spin_unlock_irqrestore(&cmd->t_state_lock, flags); > - return; > - } > - } Getting rid of the ->write_pending_status() here and DELAYED_TAS completely breaks WRITEs during CMD_T_ABORTED for iscsi-target and qla2xxx, when the outstanding WRITE transfer _must be_ allowed to complete before proceeding with abort. See comments in patch #23 why this can't be removed. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html