On 3/2/23 3:43 AM, Dmitry Bogdanov wrote: > On Sat, Feb 11, 2023 at 11:59:22AM +0300, Dmitry Bogdanov wrote: >> On Sun, Jan 29, 2023 at 05:44:34PM -0600, Mike Christie wrote: >>> >>> This fixes a bug where an initiator thinks a LUN_RESET has cleaned >>> up running commands when it hasn't. The bug was added in: >>> >>> commit 51ec502a3266 ("target: Delete tmr from list before processing") >>> >>> The problem occurs when: >>> >>> 1. We have N IO cmds running in the target layer spread over 2 sessions. >>> 2. The initiator sends a LUN_RESET for each session. >>> 3. session1's LUN_RESET loops over all the running commands from both >>> sessions and moves them to its local drain_task_list. >>> 4. session2's LUN_RESET does not see the LUN_RESET from session1 because >>> the commit above has it remove itself. session2 also does not see any >>> commands since the other reset moved them off the state lists. >>> 5. sessions2's LUN_RESET will then complete with a successful response. >>> 6. sessions2's inititor believes the running commands on its session are >>> now cleaned up due to the successful response and cleans up the running >>> commands from its side. It then restarts them. >>> 7. The commands do eventually complete on the backend and the target >>> starts to return aborted task statuses for them. The initiator will >>> either throw a invalid ITT error or might accidentally lookup a new task >>> if the ITT has been reallocated already. >>> >>> This fixes the bug by reverting the patch. >>> >>> Fixes: 51ec502a3266 ("target: Delete tmr from list before processing") >>> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx> >>> Reviewed-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx> >> >> Actually, this patch even fixes a crash that we've just faced. >> The second LUN_RESET moves the first LUN_RESET from tmr_list to its >> drain_tmr_list, then the first LUN_RESET removes itself from second`s >> drain_tmr_list, then the second LUN_RESET tries to remove the first from >> the list and crashes because it was deleted already. >> So, >> >> Tested-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx> > > Unfortunately, I am revoking my tags. This patch leads to deadlock of two > LUN_RESETs waiting for each other in its drain_tmr_list. > > To keep LUN_RESETs ignoring each other something like that is needed: >> --- >> drivers/target/target_core_tmr.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c >> index 2b95b4550a63..a60802b4c5a3 100644 >> --- a/drivers/target/target_core_tmr.c >> +++ b/drivers/target/target_core_tmr.c >> @@ -188,9 +188,10 @@ static void core_tmr_drain_tmr_list( >> * LUN_RESET tmr.. >> */ >> spin_lock_irqsave(&dev->se_tmr_lock, flags); >> - if (tmr) >> - list_del_init(&tmr->tmr_list); >> list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) { > - > + if (tmr_p == tmr) > - > + continue; > - > + > > + /* Ignore LUN_RESETs to avoid deadlocks */ > + if (tmr_p->function == TMR_LUN_RESET) > + continue; > + Shoot, that adds back the bug I was hitting. I have an idea for how to fix both issues, but let me do some more testing and will repost the set. Thanks for testing and reviewing this.