On Tue, 2017-02-21 at 21:42 +0000, Bart Van Assche wrote: > On 02/20/2017 03:52 PM, Nicholas A. Bellinger wrote: > > As-is just ignoring CMD_T_COMPLETE in core_tmr_drain_state_list() is not > > enough to address this bug and not cause other issues, because once a > > se_cmd descriptor is handed back to the fabric driver after > > transport_cmd_check_stop_to_fabric() is called, > > __target_check_io_state() must not attempt to abort the descriptor. > > Please note that my patch is not ignoring CMD_T_COMPLETE - what it does > is to postpone sending back the LUN RESET response to the initiator > until the responses for all commands affected by the LUN RESET have been > sent. Not exactly. Here are the relevant parts of the patch again for reference: @@ -127,7 +127,7 @@ static bool __target_check_io_state(struct se_cmd *se_cmd, * long as se_cmd->cmd_kref is still active unless zero. */ spin_lock(&se_cmd->t_state_lock); - if (se_cmd->transport_state & (CMD_T_COMPLETE | CMD_T_FABRIC_STOP)) { + if (se_cmd->transport_state & (skip_flags | CMD_T_FABRIC_STOP)) { pr_debug("Attempted to abort io tag: %llu already complete or" " fabric stop, skipping\n", se_cmd->tag); spin_unlock(&se_cmd->t_state_lock); @@ -354,7 +355,7 @@ static void core_tmr_drain_state_list( continue; spin_lock(&sess->sess_cmd_lock); - rc = __target_check_io_state(cmd, tmr_sess, tas); + rc = __target_check_io_state(cmd, 0, tmr_sess, tas); spin_unlock(&sess->sess_cmd_lock); if (!rc) continue; Your patch ignores CMD_T_COMPLETE in __target_check_io_state() for se_cmd descriptors in se_device->state_list, in order to ensure target_complete_ok_work() -> TFO->queue_status() gets called and transport_cmd_check_stop_to_fabric() catches CMD_T_STOP to quiesce the se_cmd, before the final LUN_RESET response is sent. My concern was not that this didn't patch address the original bug, but that it introduced other regressions as a consequence. To my original concern if ignoring CMD_T_COMPLETE in __target_check_io_state() after the hand-off to fabric code via transport_cmd_check_stop_to_fabric() is a problem, this should not be an issue considering target_remove_from_state_list() is called to remove se_cmd from se_device->state_list before checking CMD_T_STOP, and clearing CMD_T_ACTIVE in transport_cmd_check_stop_to_fabric(). However, there is still a race between when core_tmr_drain_state_list() calls __target_check_io_state() to set CMD_T_ABORTED and invokes transport_wait_for_tasks() to set CMD_T_STOP, and when transport_cmd_check_stop_to_fabric() does the final CMD_T_STOP check after the se_cmd hand-off back to fabric driver code. Which means a se_cmd w/ CMD_T_ABORTED could be handed back to fabric driver code and miss the last CMD_T_STOP check within transport_cmd_check_stop_to_fabric() because transport_wait_for_tasks() had not been called yet, which would leave transport_wait_for_tasks() blocked on cmd->t_transport_stop_comp waiting for the last CMD_T_STOP check to complete after fabric hand-off, that never comes. The other issue with this patch is that it still sends the actual status for the original se_cmd, even after the se_cmd has been CMD_T_ABORTED. > I can move this patch after the patch that makes TMF handling > synchronous because that patch makes it safe to set the CMD_T_ABORTED > flag at any time. To reiterate again. Any patch intended to address a bug in existing upstream code needs to be tested as a stand-alone patch, and not depend upon other non bug-fix related changes. > > > That said, here is how I'd like to address this particular bug. > > > > 1) Allow CMD_T_COMPLETE to occur, but still ignore se_cmds that have > > already called transport_cmd_check_stop_to_fabric(). Eg: CMD_T_ACTIVE > > is not set, but CMD_T_SENT is set. > > My understanding of your patch is that it will cause the LUN RESET > implementation to ignore those commands for which CMD_T_FABRIC_STOP has > been set and that commands for which CMD_T_ACTIVE has been set but > CMD_T_SENT not will also be ignored. No. As per your original problem statement, the issue is when a se_cmd has been marked as CMD_T_COMPLETE before TFO->queue_status() has been called, which gets skipped by __target_check_io_state() and CMD_T_ABORTED, causing LUN_RESET response to get sent before target_complete_ok_work() completes. Like your patch, it waits for outstanding se_cmds to complete from backend driver core before sending the LUN_RESET response, by also ignoring CMD_T_COMPLETE state check in __target_check_io_state() for se_cmds specifically within core_tmr_drain_state_list(). It was different because it also checked to make ensure the hand-off back to fabric driver code didn't already happen, which in retrospect is unnecessary given se_cmd->active_list is already getting removed within transport_cmd_check_stop_to_fabric(). The other thing it did was avoid invoking TFO->queue_status(), et al. in target_complete_ok_work(), once a se_cmd had been CMD_T_ABORTED. > Sorry but I don't think this > approach is sufficient to fix the data corruption issue I observed. > Sure it does. Alas, I was hoping that you'd actually prove it by testing it, but that hasn't happened yet. :) Anyways, here is the updated version based on your original that I'd like you to verify, before pushing to mainline. It contains: 1) Closes the race between CMD_T_ABORTED + CMD_T_STOP assignment mentioned above, by adding a CMD_T_ABORTED check in transport_cmd_check_stop_to_fabric() to match what's already in target_complete_cmd(). Note given your previous patch in target-pending/for-next to re-factor transport_cmd_check_stop_to_fabric(), this will require manual patching for stable. 2) Avoids sending the TFO->queue_status(), et al. once a se_cmd has been CMD_T_ABORTED within target_complete_ok_work(). Why don't you give it a spin with ib_srpt as a stand-alone patch..? diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c index dce1e1b..5ec2215 100644 --- a/drivers/target/target_core_tmr.c +++ b/drivers/target/target_core_tmr.c @@ -109,7 +109,7 @@ static int target_check_cdb_and_preempt(struct list_head *list, return 1; } -static bool __target_check_io_state(struct se_cmd *se_cmd, +static bool __target_check_io_state(struct se_cmd *se_cmd, u32 ignore_state, struct se_session *tmr_sess, int tas) { struct se_session *sess = se_cmd->se_sess; @@ -117,19 +117,32 @@ static bool __target_check_io_state(struct se_cmd *se_cmd, assert_spin_locked(&sess->sess_cmd_lock); WARN_ON_ONCE(!irqs_disabled()); /* - * If command already reached CMD_T_COMPLETE state within - * target_complete_cmd() or CMD_T_FABRIC_STOP due to shutdown, - * this se_cmd has been passed to fabric driver and will - * not be aborted. + * For ABORT_TASK, if command already reached CMD_T_COMPLETE + * state within target_complete_cmd() or CMD_T_FABRIC_STOP + * due to shutdown, this se_cmd has been passed to fabric + * driver and will not be aborted. + * + * For LUN_RESET, check for CMD_T_FABRIC_STOP to determine + * if se_cmd is being stopped due to session shutdown and + * abort should be avoided. Otherwise, se_cmd->state_list + * will have already been removed from se_device->state_list + * via target_remove_from_state_list() from within + * transport_cmd_check_stop_to_fabric() code. + * + * Once fabric hand-off is complete and CMD_T_ACTIVE is + * cleared by transport_cmd_check_stop_to_fabric(), there + * no need to check additional transport_state when called + * for LUN_RESET via core_tmr_drain_state_list(). * * Otherwise, obtain a local se_cmd->cmd_kref now for TMR * ABORT_TASK + LUN_RESET for CMD_T_ABORTED processing as * long as se_cmd->cmd_kref is still active unless zero. */ spin_lock(&se_cmd->t_state_lock); - if (se_cmd->transport_state & (CMD_T_COMPLETE | CMD_T_FABRIC_STOP)) { + if (se_cmd->transport_state & (ignore_state)) { pr_debug("Attempted to abort io tag: %llu already complete or" - " fabric stop, skipping\n", se_cmd->tag); + " fabric stop 0x%08x, skipping\n", se_cmd->tag, + ignore_state); spin_unlock(&se_cmd->t_state_lock); return false; } @@ -175,7 +188,8 @@ void core_tmr_abort_task( printk("ABORT_TASK: Found referenced %s task_tag: %llu\n", se_cmd->se_tfo->get_fabric_name(), ref_tag); - if (!__target_check_io_state(se_cmd, se_sess, 0)) + if (!__target_check_io_state(se_cmd, CMD_T_COMPLETE|CMD_T_FABRIC_STOP, + se_sess, 0)) continue; list_del_init(&se_cmd->se_cmd_list); @@ -341,7 +355,7 @@ static void core_tmr_drain_state_list( continue; spin_lock(&sess->sess_cmd_lock); - rc = __target_check_io_state(cmd, tmr_sess, tas); + rc = __target_check_io_state(cmd, CMD_T_FABRIC_STOP, tmr_sess, tas); spin_unlock(&sess->sess_cmd_lock); if (!rc) continue; diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 434d9d6..65030ce 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -617,8 +617,9 @@ static int transport_cmd_check_stop_to_fabric(struct se_cmd *cmd) * Determine if frontend context caller is requesting the stopping of * this command for frontend exceptions. */ - if (cmd->transport_state & CMD_T_STOP) { - pr_debug("%s:%d CMD_T_STOP for ITT: 0x%08llx\n", + if (cmd->transport_state & (CMD_T_ABORTED | CMD_T_STOP)) { + + pr_debug("%s:%d CMD_T_ABORTED|CMD_T_STOP for ITT: 0x%08llx\n", __func__, __LINE__, cmd->tag); spin_unlock_irqrestore(&cmd->t_state_lock, flags); @@ -2082,6 +2083,10 @@ static void target_complete_ok_work(struct work_struct *work) */ if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE) { WARN_ON(!cmd->scsi_status); + + if (cmd->transport_state & CMD_T_ABORTED) + goto queue_out; + ret = transport_send_check_condition_and_sense( cmd, 0, 1); if (ret == -EAGAIN || ret == -ENOMEM) @@ -2108,6 +2113,9 @@ static void target_complete_ok_work(struct work_struct *work) return; } else if (rc) { + if (cmd->transport_state & CMD_T_ABORTED) + goto queue_out; + ret = transport_send_check_condition_and_sense(cmd, rc, 0); if (ret == -EAGAIN || ret == -ENOMEM) @@ -2120,6 +2128,9 @@ static void target_complete_ok_work(struct work_struct *work) } queue_rsp: + if (cmd->transport_state & CMD_T_ABORTED) + goto queue_out; + switch (cmd->data_direction) { case DMA_FROM_DEVICE: if (cmd->scsi_status) @@ -2174,6 +2185,7 @@ static void target_complete_ok_work(struct work_struct *work) break; } +queue_out: transport_lun_remove_cmd(cmd); transport_cmd_check_stop_to_fabric(cmd); return; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html