Re: [PATCH] scsi: Fix failed request error code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 4 Apr 2018 07:06:58 +0000
Damien Le Moal <Damien.LeMoal@xxxxxxx> wrote:

> Hannes,
> 
> On 4/4/18 15:57, Hannes Reinecke wrote:
> > On Wed,  4 Apr 2018 15:51:38 +0900
> > Damien Le Moal <damien.lemoal@xxxxxxx> wrote:
> >   
> >> With the introduction of commit e39a97353e53 ("scsi: core: return
> >> BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command
> >> that failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but
> >> lacking additional sense information will have a return code set to
> >> BLK_STS_OK. This results in the request issuer to see successful
> >> request execution despite the failure. An example of such case is
> >> an unaligned write on a host managed ZAC disk connected to a SAS
> >> HBA with a malfunctioning SAT. The unaligned write command gets
> >> aborted but has no additional sense information.
> >>
> >> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
> >> driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key :
> >> Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense:
> >> No additional sense information sd 10:0:0:0: [sde] tag#3905 CDB:
> >> Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00
> >> print_req_error: I/O error, dev sde, sector 274726920
> >>
> >> In scsi_io_completion(), sense key handling to not change the
> >> request error code and success being reported to the issuer.
> >>
> >> Fix this by making sure that the error code always indicates an
> >> error if scsi_io_completion() decide that the action to be taken
> >> for a failed command is to not retry it and terminate it
> >> immediately (ACTION_FAIL) .
> >>
> >> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx>
> >> Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in
> >> __scsi_error_from_host_byte()") Cc: Hannes Reinecke <hare@xxxxxxxx>
> >> Cc: <stable@xxxxxxxxxxxxxxx>
> >> ---
> >>  drivers/scsi/scsi_lib.c | 9 +++++++++
> >>  1 file changed, 9 insertions(+)
> >>
> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> index c84f931388f2..87579bfcc186 100644
> >> --- a/drivers/scsi/scsi_lib.c
> >> +++ b/drivers/scsi/scsi_lib.c
> >> @@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd
> >> *cmd, unsigned int good_bytes) scsi_print_command(cmd);
> >>  			}
> >>  		}
> >> +		/*
> >> +		 * The command failed and should not be retried.
> >> If the host
> >> +		 * byte is DID_OK, then
> >> __scsi_error_from_host_byte() returned
> >> +		 * BLK_STS_OK and error indicates a success. Make
> >> sure to not
> >> +		 * use that as the completion code and always
> >> return an
> >> +		 * I/O error.
> >> +		 */
> >> +		if (error == BLK_STS_OK)
> >> +			error = BLK_STS_IOERR;
> >>  		if (!scsi_end_request(req, error,
> >> blk_rq_err_bytes(req), 0)) return;
> >>  		/*FALLTHRU*/  
> > 
> > That looks wrong.
> > Shouldn't __scsi_error_from_host_byte() return the correct status
> > here?  
> 
> My drive said:
> 
> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current]
> sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense
> information sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00
> 00 02 0c 00 01 00 00 00 01 00 00
> 
> Since hostbyte is DID_OK, __scsi_error_from_host_byte() returns
> BLK_STS_OK. The HBA fails to give sense data, so the ABORTED_COMMAND
> case in scsi_io_completion() "switch (sshdr.sense_key)" does nothing
> and error stays equal to success. scsi_end_request() gets called with
> that and dd sees a success...
> 
> There are also plenty of other sense keys cases where error is not
> changed despite the fact that error can be BLK_STS_SUCCESS (in fact, I
> think this is likely the most common case since an command failure
> with hostbyte=DID_OK and driverbyte=DRIVER_SENSE is probably the most
> common one.
> 
> My patch is a bit of a hammer and makes sure that an ACTION_FAIL
> request is completed as a failure... Am I getting all this wrong ?
> 

Just checked with the code, and send a new patch.
Which I find a bit clearer.

Fact is that __scsi_error_from_host_byte() is not sufficient to
evaluate the return code, so taking its return value with any further
checks is indeed wrong.
But then it never said so on the boilerplate of
__scsi_error_from_host_byte() ...

So please do check the 'v2' version of the patch.

Cheers,

Hannes




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]