Re: [PATCH 0/2] Fix SCSI & USB Storage CHECK CONDITION handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019/09/26 16:57, Alan Stern wrote:
> On Fri, 27 Sep 2019, Damien Le Moal wrote:
> 
>> If a non-passthrough command is terminated with a CHECK CONDITION, the
>> scsi error recovery code reuses the failed command scsi_cmnd structure
>> to process error recovery request sense. To preserve information
>> regarding the failed command, the functions scsi_eh_prep_cmnd() and
>> scsi_eh_restore_cmnd() respectively save and restore the original
>> command information. However, the resid field of the failed command
>> request structure is not preserved and reused for the request sense
>> handling, leading to the original command having an incorrect resid
>> when:
>> A) The command is not retried and terminated with an error
>> B) The command completes after retry and the underlying LLD does not set
>>    resid for a fully completed command (resid=0)
>>
>> The first patch of this series addresses case (A) above by adding resid
>> as part of the command information saved using struct scsi_eh_save.
> 
> Good catch.
> 
>> Case B can be observed with a WD My Book USB disks when a read or write
>> command is sent to the disk while the disk is in deep sleep mode
>> (spun down) due to a long period of inactivity (~30min).
>> In such case, the following command sequence happen:
>> 1) The read or write command is terminated after a few seconds with
>>    CHECK CONDITION and an asc/ascq of 04/01 (LOGICAL UNIT IS IN PROCESS
>>    OF BECOMING READY)
>> 2) In response to this failure, the USB mass storage driver triggers
>>    autosense processing, reusing the command descriptor to issue a
>>    request sense command with a 96B sense buffer size. The reply
>>    to this command gives a NOT READY / LOGICAL UNIT IS IN PROCESS
>>    OF BECOMING READY sense of 18B, resulting in a resid of 78B.
>> 3) The original command is retried and failed again, with step 2
>>    repeated, until the drive spins up and becomes ready.
>> 4) When the original command completes after the drive has become ready,
>>    the request sense command resid of 78B is seen by the scsi disk
>>    driver sd_done() function and wrongly generates a warning about the
>>    unaligned value reported.
> 
> But with the 1/1 patch in place, 4 won't happen any more, right?  
> sd_done() will see the resid from the successful read or write.

No it will not because the USB driver does not set the residue when the value
returned for the command is 0, that is, when the command succeeds after the disk
spins up. E.g., in my test, I used a 4096B read at LBA 0. With patch 1 only, I
see a residue of 4096 since the command is entirely failed with CHECK CONDITION
until the drive spins up and becomes ready, at which point the resid becomes 0
but is not set with scsi_set_resid(). Without patch 1 nor 2, I see a residue of
78B which is the residue from the request sense executed between retries of the
read command.

Patch 2 is needed to make sure that the residue is set to 0 when the command
fully completes. If there is no failure/request sense/retry dance, resid is
already 0 (initialized value) so there is no problem. The problem shows up only
for CHECK CONDITION + retry patterns. Tests confirm this. Only patch 1 does not
solve the problem.

>> This problem is fixed in patch 2 by always setting a command resid to 0
>> when there is no residual in usb_stor_Bulk_transport(). Note that
>> usb_stor_CB_transport() does not need changes since usb_stor_bulk_srb()
>> always sets the resid for a completed command, regardless of the
>> residual value.
> 
> Exactly the same reasoning shows that usb_stor_Bulk_transport() also
> does not need changes, doesn't it?  Which means that patch 2/2 is 
> unnecessary.

In usb_stor_Bulk_transport(), since scsi_set_resid() is only called under the

if (residue && !(us->fflags & US_FL_IGNORE_RESIDUE))

condition, resid is not set if it is 0.

Whereas in usb_stor_CB_transport(), through the call to usb_stor_bulk_srb(),
resid is always set, unconditionally, with:

scsi_set_resid(srb, scsi_bufflen(srb) - partial);

where partial is the command length for a fully completed command (the variable
name is misleading), leading to resid being set to 0 as needed for successful
commands.

Please let me know if I missed something. I can reproduce the problem 100% of
the time, even though it is painful due to the 30min wait between tests to wait
for the drive going to sleep (if I force a sleep power state, the problem does
not happen due to power control the drive USB bridge FW, I am guessing, which
will wake up the disk before the command is sent).

> 
> Alan Stern
> 
> PS: The correct term is "residue", not "residual".  I know that the 
> code sometimes uses the wrong word.

Thanks... I keep using resid but trying to write proper sentences, I mess up the
word extension :)


-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux