On Thu, Mar 02, 2023 at 11:29:50AM -0600, Mike Christie wrote: > > On 3/2/23 3:43 AM, Dmitry Bogdanov wrote: > > On Sat, Feb 11, 2023 at 11:59:22AM +0300, Dmitry Bogdanov wrote: > >> On Sun, Jan 29, 2023 at 05:44:34PM -0600, Mike Christie wrote: > >>> > >>> This fixes a bug where an initiator thinks a LUN_RESET has cleaned > >>> up running commands when it hasn't. The bug was added in: > >>> > >>> commit 51ec502a3266 ("target: Delete tmr from list before processing") > >>> > >>> The problem occurs when: > >>> > >>> 1. We have N IO cmds running in the target layer spread over 2 sessions. > >>> 2. The initiator sends a LUN_RESET for each session. > >>> 3. session1's LUN_RESET loops over all the running commands from both > >>> sessions and moves them to its local drain_task_list. > >>> 4. session2's LUN_RESET does not see the LUN_RESET from session1 because > >>> the commit above has it remove itself. session2 also does not see any > >>> commands since the other reset moved them off the state lists. > >>> 5. sessions2's LUN_RESET will then complete with a successful response. > >>> 6. sessions2's inititor believes the running commands on its session are > >>> now cleaned up due to the successful response and cleans up the running > >>> commands from its side. It then restarts them. > >>> 7. The commands do eventually complete on the backend and the target > >>> starts to return aborted task statuses for them. The initiator will > >>> either throw a invalid ITT error or might accidentally lookup a new task > >>> if the ITT has been reallocated already. > >>> > >>> This fixes the bug by reverting the patch. > >>> > >>> Fixes: 51ec502a3266 ("target: Delete tmr from list before processing") > >>> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx> > >>> Reviewed-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx> > >> > >> Actually, this patch even fixes a crash that we've just faced. > >> The second LUN_RESET moves the first LUN_RESET from tmr_list to its > >> drain_tmr_list, then the first LUN_RESET removes itself from second`s > >> drain_tmr_list, then the second LUN_RESET tries to remove the first from > >> the list and crashes because it was deleted already. > >> So, > >> > >> Tested-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx> > > > > Unfortunately, I am revoking my tags. This patch leads to deadlock of two > > LUN_RESETs waiting for each other in its drain_tmr_list. > > > > To keep LUN_RESETs ignoring each other something like that is needed: > >> --- > >> drivers/target/target_core_tmr.c | 5 +++-- > >> 1 file changed, 3 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c > >> index 2b95b4550a63..a60802b4c5a3 100644 > >> --- a/drivers/target/target_core_tmr.c > >> +++ b/drivers/target/target_core_tmr.c > >> @@ -188,9 +188,10 @@ static void core_tmr_drain_tmr_list( > >> * LUN_RESET tmr.. > >> */ > >> spin_lock_irqsave(&dev->se_tmr_lock, flags); > >> - if (tmr) > >> - list_del_init(&tmr->tmr_list); > >> list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) { > > - > + if (tmr_p == tmr) > > - > + continue; > > - > + > > > > + /* Ignore LUN_RESETs to avoid deadlocks */ > > + if (tmr_p->function == TMR_LUN_RESET) > > + continue; > > + > > Shoot, that adds back the bug I was hitting. Haha, exactly. > I have an idea for how to fix both issues, but let me do some more testing > and will repost the set. Thanks for testing and reviewing this. scst serializes LUN_RESET execution, may be we should do the same.