Re: [PATCH] scsi: Fix failed request error code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hannes,

On 4/4/18 16:35, Hannes Reinecke wrote:
> On Wed, 4 Apr 2018 07:06:58 +0000
> Damien Le Moal <Damien.LeMoal@xxxxxxx> wrote:
> 
>> Hannes,
>>
>> On 4/4/18 15:57, Hannes Reinecke wrote:
>>> On Wed,  4 Apr 2018 15:51:38 +0900
>>> Damien Le Moal <damien.lemoal@xxxxxxx> wrote:
>>>   
>>>> With the introduction of commit e39a97353e53 ("scsi: core: return
>>>> BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command
>>>> that failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but
>>>> lacking additional sense information will have a return code set to
>>>> BLK_STS_OK. This results in the request issuer to see successful
>>>> request execution despite the failure. An example of such case is
>>>> an unaligned write on a host managed ZAC disk connected to a SAS
>>>> HBA with a malfunctioning SAT. The unaligned write command gets
>>>> aborted but has no additional sense information.
>>>>
>>>> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
>>>> driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key :
>>>> Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense:
>>>> No additional sense information sd 10:0:0:0: [sde] tag#3905 CDB:
>>>> Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00
>>>> print_req_error: I/O error, dev sde, sector 274726920
>>>>
>>>> In scsi_io_completion(), sense key handling to not change the
>>>> request error code and success being reported to the issuer.
>>>>
>>>> Fix this by making sure that the error code always indicates an
>>>> error if scsi_io_completion() decide that the action to be taken
>>>> for a failed command is to not retry it and terminate it
>>>> immediately (ACTION_FAIL) .
>>>>
>>>> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx>
>>>> Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in
>>>> __scsi_error_from_host_byte()") Cc: Hannes Reinecke <hare@xxxxxxxx>
>>>> Cc: <stable@xxxxxxxxxxxxxxx>
>>>> ---
>>>>  drivers/scsi/scsi_lib.c | 9 +++++++++
>>>>  1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>> index c84f931388f2..87579bfcc186 100644
>>>> --- a/drivers/scsi/scsi_lib.c
>>>> +++ b/drivers/scsi/scsi_lib.c
>>>> @@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd
>>>> *cmd, unsigned int good_bytes) scsi_print_command(cmd);
>>>>  			}
>>>>  		}
>>>> +		/*
>>>> +		 * The command failed and should not be retried.
>>>> If the host
>>>> +		 * byte is DID_OK, then
>>>> __scsi_error_from_host_byte() returned
>>>> +		 * BLK_STS_OK and error indicates a success. Make
>>>> sure to not
>>>> +		 * use that as the completion code and always
>>>> return an
>>>> +		 * I/O error.
>>>> +		 */
>>>> +		if (error == BLK_STS_OK)
>>>> +			error = BLK_STS_IOERR;
>>>>  		if (!scsi_end_request(req, error,
>>>> blk_rq_err_bytes(req), 0)) return;
>>>>  		/*FALLTHRU*/  
>>>
>>> That looks wrong.
>>> Shouldn't __scsi_error_from_host_byte() return the correct status
>>> here?  
>>
>> My drive said:
>>
>> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
>> driverbyte=DRIVER_SENSE
>> sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current]
>> sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense
>> information sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00
>> 00 02 0c 00 01 00 00 00 01 00 00
>>
>> Since hostbyte is DID_OK, __scsi_error_from_host_byte() returns
>> BLK_STS_OK. The HBA fails to give sense data, so the ABORTED_COMMAND
>> case in scsi_io_completion() "switch (sshdr.sense_key)" does nothing
>> and error stays equal to success. scsi_end_request() gets called with
>> that and dd sees a success...
>>
>> There are also plenty of other sense keys cases where error is not
>> changed despite the fact that error can be BLK_STS_SUCCESS (in fact, I
>> think this is likely the most common case since an command failure
>> with hostbyte=DID_OK and driverbyte=DRIVER_SENSE is probably the most
>> common one.
>>
>> My patch is a bit of a hammer and makes sure that an ACTION_FAIL
>> request is completed as a failure... Am I getting all this wrong ?
>>
> 
> Just checked with the code, and send a new patch.
> Which I find a bit clearer.
> 
> Fact is that __scsi_error_from_host_byte() is not sufficient to
> evaluate the return code, so taking its return value with any further
> checks is indeed wrong.
> But then it never said so on the boilerplate of
> __scsi_error_from_host_byte() ...
> 
> So please do check the 'v2' version of the patch.
> 

It fixes the particular problem I am seeing.
But looking more at this code, isn't there a problem with
sshdr.sense_key == RECOVERED_ERROR ?

if (sense_valid && (sshdr.sense_key == RECOVERED_ERROR)) {
	...
	result = 0;
        /* for passthrough error may be set */
        error = BLK_STS_OK;
}

Then the error will be changed wrongly to BLK_STS_IOERR.

My original change had the same problem.


-- 
Damien Le Moal,
Western Digital




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]