On Wed, 4 Apr 2018 07:06:58 +0000 Damien Le Moal <Damien.LeMoal@xxxxxxx> wrote: > Hannes, > > On 4/4/18 15:57, Hannes Reinecke wrote: > > On Wed, 4 Apr 2018 15:51:38 +0900 > > Damien Le Moal <damien.lemoal@xxxxxxx> wrote: > > > >> With the introduction of commit e39a97353e53 ("scsi: core: return > >> BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command > >> that failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but > >> lacking additional sense information will have a return code set to > >> BLK_STS_OK. This results in the request issuer to see successful > >> request execution despite the failure. An example of such case is > >> an unaligned write on a host managed ZAC disk connected to a SAS > >> HBA with a malfunctioning SAT. The unaligned write command gets > >> aborted but has no additional sense information. > >> > >> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK > >> driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key : > >> Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense: > >> No additional sense information sd 10:0:0:0: [sde] tag#3905 CDB: > >> Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00 > >> print_req_error: I/O error, dev sde, sector 274726920 > >> > >> In scsi_io_completion(), sense key handling to not change the > >> request error code and success being reported to the issuer. > >> > >> Fix this by making sure that the error code always indicates an > >> error if scsi_io_completion() decide that the action to be taken > >> for a failed command is to not retry it and terminate it > >> immediately (ACTION_FAIL) . > >> > >> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> > >> Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in > >> __scsi_error_from_host_byte()") Cc: Hannes Reinecke <hare@xxxxxxxx> > >> Cc: <stable@xxxxxxxxxxxxxxx> > >> --- > >> drivers/scsi/scsi_lib.c | 9 +++++++++ > >> 1 file changed, 9 insertions(+) > >> > >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > >> index c84f931388f2..87579bfcc186 100644 > >> --- a/drivers/scsi/scsi_lib.c > >> +++ b/drivers/scsi/scsi_lib.c > >> @@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd > >> *cmd, unsigned int good_bytes) scsi_print_command(cmd); > >> } > >> } > >> + /* > >> + * The command failed and should not be retried. > >> If the host > >> + * byte is DID_OK, then > >> __scsi_error_from_host_byte() returned > >> + * BLK_STS_OK and error indicates a success. Make > >> sure to not > >> + * use that as the completion code and always > >> return an > >> + * I/O error. > >> + */ > >> + if (error == BLK_STS_OK) > >> + error = BLK_STS_IOERR; > >> if (!scsi_end_request(req, error, > >> blk_rq_err_bytes(req), 0)) return; > >> /*FALLTHRU*/ > > > > That looks wrong. > > Shouldn't __scsi_error_from_host_byte() return the correct status > > here? > > My drive said: > > sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK > driverbyte=DRIVER_SENSE > sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current] > sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense > information sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00 > 00 02 0c 00 01 00 00 00 01 00 00 > > Since hostbyte is DID_OK, __scsi_error_from_host_byte() returns > BLK_STS_OK. The HBA fails to give sense data, so the ABORTED_COMMAND > case in scsi_io_completion() "switch (sshdr.sense_key)" does nothing > and error stays equal to success. scsi_end_request() gets called with > that and dd sees a success... > > There are also plenty of other sense keys cases where error is not > changed despite the fact that error can be BLK_STS_SUCCESS (in fact, I > think this is likely the most common case since an command failure > with hostbyte=DID_OK and driverbyte=DRIVER_SENSE is probably the most > common one. > > My patch is a bit of a hammer and makes sure that an ACTION_FAIL > request is completed as a failure... Am I getting all this wrong ? > Just checked with the code, and send a new patch. Which I find a bit clearer. Fact is that __scsi_error_from_host_byte() is not sufficient to evaluate the return code, so taking its return value with any further checks is indeed wrong. But then it never said so on the boilerplate of __scsi_error_from_host_byte() ... So please do check the 'v2' version of the patch. Cheers, Hannes