Re: [PATCH 0/3] Fix USB deadlock caused by SCSI error handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/01/2014 11:28 PM, Alan Stern wrote:
> On Tue, 1 Apr 2014, Hannes Reinecke wrote:
> 
>>>> So if the above reasoning is okay then this patch should be doing
>>>> the trick:
>>>>
>>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>>>> index 771c16b..0e72374 100644
>>>> --- a/drivers/scsi/scsi_error.c
>>>> +++ b/drivers/scsi/scsi_error.c
>>>> @@ -189,6 +189,7 @@ scsi_abort_command(struct scsi_cmnd *scmd)
>>>>                 /*
>>>>                  * Retry after abort failed, escalate to next level.
>>>>                  */
>>>> +               scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
>>>>                 SCSI_LOG_ERROR_RECOVERY(3,
>>>>                         scmd_printk(KERN_INFO, scmd,
>>>>                                     "scmd %p previous abort
>>>> failed\n", scmd));
>>>>
>>>> (Beware of line
>>>> breaks)
>>>>
>>>> Can you test with it?
>>>
>>> I don't understand.  This doesn't solve the fundamental problem (namely 
>>> that you escalate before aborting a running command).  All it does is 
>>> clear the SCSI_EH_ABORT_SCHEDULED flag before escalating.
>>>
>> Which was precisely the point :-)
>>
>> Hmm. The comment might've been clearer.
>>
>> What this patch is _supposed_ to be doing is that it'll clear the
>> SCSI_EH_ABORT_SCHEDULED flag it it has been set.
>> Which means this will be the second time scsi_abort_command() has
>> been called for the same command.
>> IE the first abort went out, did its thing, but now the same command
>> has timed out again.
>>
>> So this flag gets cleared, and scsi_abort_command() returns FAILED,
>> and _no_ asynchronous abort is being scheduled.
>> scsi_times_out() will then proceed to call scsi_eh_scmd_add().
>> But as we've cleared the SCSI_EH_ABORT_SCHEDULED flag
>> the SCSI_EH_CANCEL_CMD flag will continue to be set,
>> and the command will be aborted with the main SCSI EH routine.
>>
>> It looks to me as if it should do what you desire, namely abort the
>> command asynchronously the first time, and invoking the SCSI EH the
>> second time.
>>
>> Am I wrong?
> 
> I don't know -- I'll have to try it out.  Currently I'm busy with a 
> bunch of other stuff, so it will take some time.
> 
> Looking through the code, I have to wonder why scsi_times_out()  
> modifies scmd->result.  Won't this value get overwritten by the LLDD
> when the command eventually terminates?
> 
Yes. However, the 'DID_TIME_OUT' is just a marker that the internal
timeout code triggered.
If the LLDD overwrites it with a 'real' error code everything's fine.

> Even worse, what happens in the event of a race where the command 
> terminates normally just before scsi_times_out() changes scmd->result?
> 
_That_ is the least of our worries.
_If_ the LLDD completes the command while scsi_times_out() is
running the whole thing is going down the drain anyway, as the
command will be terminated by the LLDD, and we can only hope that we
didn't mess up our reference counting. Otherwise we'd have EH
running on a stale command, which is going to be a fun to watch.

But looking closer, it might be that the line modifying the result
in scsi_times_out() is indeed pointless, seeing that it's being set
in scsi_eh_abort_handler(), too.
I'll be checking if we can simply remove that line.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux