Patch "scsi: target: Fix multiple LUN_RESET handling" has been added to the 6.2-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    scsi: target: Fix multiple LUN_RESET handling

to the 6.2-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     scsi-target-fix-multiple-lun_reset-handling.patch
and it can be found in the queue-6.2 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit ef35bb22bfb25bb68a782b78d635e4abcb9ab287
Author: Mike Christie <michael.christie@xxxxxxxxxx>
Date:   Sat Mar 18 20:56:18 2023 -0500

    scsi: target: Fix multiple LUN_RESET handling
    
    [ Upstream commit 673db054d7a2b5a470d7a25baf65956d005ad729 ]
    
    This fixes a bug where an initiator thinks a LUN_RESET has cleaned up
    running commands when it hasn't. The bug was added in commit 51ec502a3266
    ("target: Delete tmr from list before processing").
    
    The problem occurs when:
    
     1. We have N I/O cmds running in the target layer spread over 2 sessions.
    
     2. The initiator sends a LUN_RESET for each session.
    
     3. session1's LUN_RESET loops over all the running commands from both
        sessions and moves them to its local drain_task_list.
    
     4. session2's LUN_RESET does not see the LUN_RESET from session1 because
        the commit above has it remove itself. session2 also does not see any
        commands since the other reset moved them off the state lists.
    
     5. sessions2's LUN_RESET will then complete with a successful response.
    
     6. sessions2's inititor believes the running commands on its session are
        now cleaned up due to the successful response and cleans up the running
        commands from its side. It then restarts them.
    
     7. The commands do eventually complete on the backend and the target
        starts to return aborted task statuses for them. The initiator will
        either throw a invalid ITT error or might accidentally lookup a new
        task if the ITT has been reallocated already.
    
    Fix the bug by reverting the patch, and serialize the execution of
    LUN_RESETs and Preempt and Aborts.
    
    Also prevent us from waiting on LUN_RESETs in core_tmr_drain_tmr_list,
    because it turns out the original patch fixed a bug that was not
    mentioned. For LUN_RESET1 core_tmr_drain_tmr_list can see a second
    LUN_RESET and wait on it. Then the second reset will run
    core_tmr_drain_tmr_list and see the first reset and wait on it resulting in
    a deadlock.
    
    Fixes: 51ec502a3266 ("target: Delete tmr from list before processing")
    Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx>
    Link: https://lore.kernel.org/r/20230319015620.96006-8-michael.christie@xxxxxxxxxx
    Signed-off-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index f6e58410ec3f9..aeb03136773d5 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -782,6 +782,7 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
 	spin_lock_init(&dev->t10_alua.lba_map_lock);
 
 	INIT_WORK(&dev->delayed_cmd_work, target_do_delayed_work);
+	mutex_init(&dev->lun_reset_mutex);
 
 	dev->t10_wwn.t10_dev = dev;
 	/*
diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 2b95b4550a637..4718db628222b 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -188,14 +188,23 @@ static void core_tmr_drain_tmr_list(
 	 * LUN_RESET tmr..
 	 */
 	spin_lock_irqsave(&dev->se_tmr_lock, flags);
-	if (tmr)
-		list_del_init(&tmr->tmr_list);
 	list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) {
+		if (tmr_p == tmr)
+			continue;
+
 		cmd = tmr_p->task_cmd;
 		if (!cmd) {
 			pr_err("Unable to locate struct se_cmd for TMR\n");
 			continue;
 		}
+
+		/*
+		 * We only execute one LUN_RESET at a time so we can't wait
+		 * on them below.
+		 */
+		if (tmr_p->function == TMR_LUN_RESET)
+			continue;
+
 		/*
 		 * If this function was called with a valid pr_res_key
 		 * parameter (eg: for PROUT PREEMPT_AND_ABORT service action
@@ -379,14 +388,25 @@ int core_tmr_lun_reset(
 				tmr_nacl->initiatorname);
 		}
 	}
+
+
+	/*
+	 * We only allow one reset or preempt and abort to execute at a time
+	 * to prevent one call from claiming all the cmds causing a second
+	 * call from returning while cmds it should have waited on are still
+	 * running.
+	 */
+	mutex_lock(&dev->lun_reset_mutex);
+
 	pr_debug("LUN_RESET: %s starting for [%s], tas: %d\n",
 		(preempt_and_abort_list) ? "Preempt" : "TMR",
 		dev->transport->name, tas);
-
 	core_tmr_drain_tmr_list(dev, tmr, preempt_and_abort_list);
 	core_tmr_drain_state_list(dev, prout_cmd, tmr_sess, tas,
 				preempt_and_abort_list);
 
+	mutex_unlock(&dev->lun_reset_mutex);
+
 	/*
 	 * Clear any legacy SPC-2 reservation when called during
 	 * LOGICAL UNIT RESET
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index bd299790e99c3..8cc42ad65c925 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -872,6 +872,7 @@ struct se_device {
 	struct rcu_head		rcu_head;
 	int			queue_cnt;
 	struct se_device_queue	*queues;
+	struct mutex		lun_reset_mutex;
 };
 
 struct target_opcode_descriptor {



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux