Re: [PATCH v3 07/14] scsi: target: Fix multiple LUN_RESET handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/3/23 3:26 AM, Dmitry Bogdanov wrote:
> On Thu, Mar 02, 2023 at 11:29:50AM -0600, Mike Christie wrote:
>>
>> On 3/2/23 3:43 AM, Dmitry Bogdanov wrote:
>>> On Sat, Feb 11, 2023 at 11:59:22AM +0300, Dmitry Bogdanov wrote:
>>>> On Sun, Jan 29, 2023 at 05:44:34PM -0600, Mike Christie wrote:
>>>>>
>>>>> This fixes a bug where an initiator thinks a LUN_RESET has cleaned
>>>>> up running commands when it hasn't. The bug was added in:
>>>>>
>>>>> commit 51ec502a3266 ("target: Delete tmr from list before processing")
>>>>>
>>>>> The problem occurs when:
>>>>>
>>>>> 1. We have N IO cmds running in the target layer spread over 2 sessions.
>>>>> 2. The initiator sends a LUN_RESET for each session.
>>>>> 3. session1's LUN_RESET loops over all the running commands from both
>>>>> sessions and moves them to its local drain_task_list.
>>>>> 4. session2's LUN_RESET does not see the LUN_RESET from session1 because
>>>>> the commit above has it remove itself. session2 also does not see any
>>>>> commands since the other reset moved them off the state lists.
>>>>> 5. sessions2's LUN_RESET will then complete with a successful response.
>>>>> 6. sessions2's inititor believes the running commands on its session are
>>>>> now cleaned up due to the successful response and cleans up the running
>>>>> commands from its side. It then restarts them.
>>>>> 7. The commands do eventually complete on the backend and the target
>>>>> starts to return aborted task statuses for them. The initiator will
>>>>> either throw a invalid ITT error or might accidentally lookup a new task
>>>>> if the ITT has been reallocated already.
>>>>>
>>>>> This fixes the bug by reverting the patch.
>>>>>
>>>>> Fixes: 51ec502a3266 ("target: Delete tmr from list before processing")
>>>>> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx>
>>>>> Reviewed-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx>
>>>>
>>>> Actually, this patch even fixes a crash that we've just faced.
>>>> The second LUN_RESET moves the first LUN_RESET from tmr_list to its
>>>> drain_tmr_list, then the first LUN_RESET removes itself from second`s
>>>> drain_tmr_list, then the second LUN_RESET tries to remove the first from
>>>> the list and crashes because it was deleted already.
>>>> So,
>>>>
>>>> Tested-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx>
>>>
>>> Unfortunately, I am revoking my tags. This patch leads to deadlock of two
>>> LUN_RESETs waiting for each other in its drain_tmr_list.
>>>
>>> To keep LUN_RESETs ignoring each other something like that is needed:
>>>> ---
>>>>  drivers/target/target_core_tmr.c | 5 +++--
>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
>>>> index 2b95b4550a63..a60802b4c5a3 100644
>>>> --- a/drivers/target/target_core_tmr.c
>>>> +++ b/drivers/target/target_core_tmr.c
>>>> @@ -188,9 +188,10 @@ static void core_tmr_drain_tmr_list(
>>>>          * LUN_RESET tmr..
>>>>          */
>>>>         spin_lock_irqsave(&dev->se_tmr_lock, flags);
>>>> -       if (tmr)
>>>> -               list_del_init(&tmr->tmr_list);
>>>>         list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) {
>>> - > +               if (tmr_p == tmr)
>>> - > +                       continue;
>>> - > +
>>>
>>> +             /* Ignore LUN_RESETs to avoid deadlocks */
>>> +             if (tmr_p->function == TMR_LUN_RESET)
>>> +                     continue;
>>> +
>>
>> Shoot, that adds back the bug I was hitting.
> 
> Haha, exactly.
> 
>> I have an idea for how to fix both issues, but let me do some more testing
>> and will repost the set. Thanks for testing and reviewing this.
> 
> scst serializes LUN_RESET execution, may be we should do the same.
> 

That's what I'm working on.

I was thinking async aborts are nice for ESX where you have N VMs on a LUN
and you don't want VM1's abort to have to wait for VM2's abort so left that.
However, for LUN_RESET, it has to wait for all the commands, so it's not as
bad to serialize them.



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux