On Sat, Feb 11, 2023 at 11:59:22AM +0300, Dmitry Bogdanov wrote: > On Sun, Jan 29, 2023 at 05:44:34PM -0600, Mike Christie wrote: > > > > This fixes a bug where an initiator thinks a LUN_RESET has cleaned > > up running commands when it hasn't. The bug was added in: > > > > commit 51ec502a3266 ("target: Delete tmr from list before processing") > > > > The problem occurs when: > > > > 1. We have N IO cmds running in the target layer spread over 2 sessions. > > 2. The initiator sends a LUN_RESET for each session. > > 3. session1's LUN_RESET loops over all the running commands from both > > sessions and moves them to its local drain_task_list. > > 4. session2's LUN_RESET does not see the LUN_RESET from session1 because > > the commit above has it remove itself. session2 also does not see any > > commands since the other reset moved them off the state lists. > > 5. sessions2's LUN_RESET will then complete with a successful response. > > 6. sessions2's inititor believes the running commands on its session are > > now cleaned up due to the successful response and cleans up the running > > commands from its side. It then restarts them. > > 7. The commands do eventually complete on the backend and the target > > starts to return aborted task statuses for them. The initiator will > > either throw a invalid ITT error or might accidentally lookup a new task > > if the ITT has been reallocated already. > > > > This fixes the bug by reverting the patch. > > > > Fixes: 51ec502a3266 ("target: Delete tmr from list before processing") > > Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx> > > Reviewed-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx> > > Actually, this patch even fixes a crash that we've just faced. > The second LUN_RESET moves the first LUN_RESET from tmr_list to its > drain_tmr_list, then the first LUN_RESET removes itself from second`s > drain_tmr_list, then the second LUN_RESET tries to remove the first from > the list and crashes because it was deleted already. > So, > > Tested-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx> Unfortunately, I am revoking my tags. This patch leads to deadlock of two LUN_RESETs waiting for each other in its drain_tmr_list. To keep LUN_RESETs ignoring each other something like that is needed: > --- > drivers/target/target_core_tmr.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c > index 2b95b4550a63..a60802b4c5a3 100644 > --- a/drivers/target/target_core_tmr.c > +++ b/drivers/target/target_core_tmr.c > @@ -188,9 +188,10 @@ static void core_tmr_drain_tmr_list( > * LUN_RESET tmr.. > */ > spin_lock_irqsave(&dev->se_tmr_lock, flags); > - if (tmr) > - list_del_init(&tmr->tmr_list); > list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) { - > + if (tmr_p == tmr) - > + continue; - > + + /* Ignore LUN_RESETs to avoid deadlocks */ + if (tmr_p->function == TMR_LUN_RESET) + continue; + > cmd = tmr_p->task_cmd; > if (!cmd) { > pr_err("Unable to locate struct se_cmd for TMR\n"); > -- > 2.25.1