On Sun, Jan 29, 2023 at 05:44:34PM -0600, Mike Christie wrote: > > This fixes a bug where an initiator thinks a LUN_RESET has cleaned > up running commands when it hasn't. The bug was added in: > > commit 51ec502a3266 ("target: Delete tmr from list before processing") > > The problem occurs when: > > 1. We have N IO cmds running in the target layer spread over 2 sessions. > 2. The initiator sends a LUN_RESET for each session. > 3. session1's LUN_RESET loops over all the running commands from both > sessions and moves them to its local drain_task_list. > 4. session2's LUN_RESET does not see the LUN_RESET from session1 because > the commit above has it remove itself. session2 also does not see any > commands since the other reset moved them off the state lists. > 5. sessions2's LUN_RESET will then complete with a successful response. > 6. sessions2's inititor believes the running commands on its session are > now cleaned up due to the successful response and cleans up the running > commands from its side. It then restarts them. > 7. The commands do eventually complete on the backend and the target > starts to return aborted task statuses for them. The initiator will > either throw a invalid ITT error or might accidentally lookup a new task > if the ITT has been reallocated already. > > This fixes the bug by reverting the patch. > > Fixes: 51ec502a3266 ("target: Delete tmr from list before processing") > Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx> > Reviewed-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx> Actually, this patch even fixes a crash that we've just faced. The second LUN_RESET moves the first LUN_RESET from tmr_list to its drain_tmr_list, then the first LUN_RESET removes itself from second`s drain_tmr_list, then the second LUN_RESET tries to remove the first from the list and crashes because it was deleted already. So, Tested-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx>