Re: [PATCH v3 07/14] scsi: target: Fix multiple LUN_RESET handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/2/23 3:43 AM, Dmitry Bogdanov wrote:
> On Sat, Feb 11, 2023 at 11:59:22AM +0300, Dmitry Bogdanov wrote:
>> On Sun, Jan 29, 2023 at 05:44:34PM -0600, Mike Christie wrote:
>>>
>>> This fixes a bug where an initiator thinks a LUN_RESET has cleaned
>>> up running commands when it hasn't. The bug was added in:
>>>
>>> commit 51ec502a3266 ("target: Delete tmr from list before processing")
>>>
>>> The problem occurs when:
>>>
>>> 1. We have N IO cmds running in the target layer spread over 2 sessions.
>>> 2. The initiator sends a LUN_RESET for each session.
>>> 3. session1's LUN_RESET loops over all the running commands from both
>>> sessions and moves them to its local drain_task_list.
>>> 4. session2's LUN_RESET does not see the LUN_RESET from session1 because
>>> the commit above has it remove itself. session2 also does not see any
>>> commands since the other reset moved them off the state lists.
>>> 5. sessions2's LUN_RESET will then complete with a successful response.
>>> 6. sessions2's inititor believes the running commands on its session are
>>> now cleaned up due to the successful response and cleans up the running
>>> commands from its side. It then restarts them.
>>> 7. The commands do eventually complete on the backend and the target
>>> starts to return aborted task statuses for them. The initiator will
>>> either throw a invalid ITT error or might accidentally lookup a new task
>>> if the ITT has been reallocated already.
>>>
>>> This fixes the bug by reverting the patch.
>>>
>>> Fixes: 51ec502a3266 ("target: Delete tmr from list before processing")
>>> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx>
>>> Reviewed-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx>
>>
>> Actually, this patch even fixes a crash that we've just faced.
>> The second LUN_RESET moves the first LUN_RESET from tmr_list to its
>> drain_tmr_list, then the first LUN_RESET removes itself from second`s
>> drain_tmr_list, then the second LUN_RESET tries to remove the first from
>> the list and crashes because it was deleted already.
>> So,
>>
>> Tested-by: Dmitry Bogdanov <d.bogdanov@xxxxxxxxx>
> 
> Unfortunately, I am revoking my tags. This patch leads to deadlock of two
> LUN_RESETs waiting for each other in its drain_tmr_list.
> 
> To keep LUN_RESETs ignoring each other something like that is needed:
>> ---
>>  drivers/target/target_core_tmr.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
>> index 2b95b4550a63..a60802b4c5a3 100644
>> --- a/drivers/target/target_core_tmr.c
>> +++ b/drivers/target/target_core_tmr.c
>> @@ -188,9 +188,10 @@ static void core_tmr_drain_tmr_list(
>>          * LUN_RESET tmr..
>>          */
>>         spin_lock_irqsave(&dev->se_tmr_lock, flags);
>> -       if (tmr)
>> -               list_del_init(&tmr->tmr_list);
>>         list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) {
> - > +               if (tmr_p == tmr)
> - > +                       continue;
> - > +
> 
> +		/* Ignore LUN_RESETs to avoid deadlocks */
> +		if (tmr_p->function == TMR_LUN_RESET)
> +			continue;
> + 

Shoot, that adds back the bug I was hitting.

I have an idea for how to fix both issues, but let me do some more testing
and will repost the set. Thanks for testing and reviewing this.




[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux