Hannes, On 4/4/18 15:57, Hannes Reinecke wrote: > On Wed, 4 Apr 2018 15:51:38 +0900 > Damien Le Moal <damien.lemoal@xxxxxxx> wrote: > >> With the introduction of commit e39a97353e53 ("scsi: core: return >> BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command >> that failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but >> lacking additional sense information will have a return code set to >> BLK_STS_OK. This results in the request issuer to see successful >> request execution despite the failure. An example of such case is an >> unaligned write on a host managed ZAC disk connected to a SAS HBA >> with a malfunctioning SAT. The unaligned write command gets aborted >> but has no additional sense information. >> >> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK >> driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key : >> Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense: No >> additional sense information sd 10:0:0:0: [sde] tag#3905 CDB: >> Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00 >> print_req_error: I/O error, dev sde, sector 274726920 >> >> In scsi_io_completion(), sense key handling to not change the request >> error code and success being reported to the issuer. >> >> Fix this by making sure that the error code always indicates an error >> if scsi_io_completion() decide that the action to be taken for a >> failed command is to not retry it and terminate it immediately >> (ACTION_FAIL) . >> >> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> >> Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in >> __scsi_error_from_host_byte()") Cc: Hannes Reinecke <hare@xxxxxxxx> >> Cc: <stable@xxxxxxxxxxxxxxx> >> --- >> drivers/scsi/scsi_lib.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >> index c84f931388f2..87579bfcc186 100644 >> --- a/drivers/scsi/scsi_lib.c >> +++ b/drivers/scsi/scsi_lib.c >> @@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd *cmd, >> unsigned int good_bytes) scsi_print_command(cmd); >> } >> } >> + /* >> + * The command failed and should not be retried. If >> the host >> + * byte is DID_OK, then >> __scsi_error_from_host_byte() returned >> + * BLK_STS_OK and error indicates a success. Make >> sure to not >> + * use that as the completion code and always return >> an >> + * I/O error. >> + */ >> + if (error == BLK_STS_OK) >> + error = BLK_STS_IOERR; >> if (!scsi_end_request(req, error, >> blk_rq_err_bytes(req), 0)) return; >> /*FALLTHRU*/ > > That looks wrong. > Shouldn't __scsi_error_from_host_byte() return the correct status here? My drive said: sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense information sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00 Since hostbyte is DID_OK, __scsi_error_from_host_byte() returns BLK_STS_OK. The HBA fails to give sense data, so the ABORTED_COMMAND case in scsi_io_completion() "switch (sshdr.sense_key)" does nothing and error stays equal to success. scsi_end_request() gets called with that and dd sees a success... There are also plenty of other sense keys cases where error is not changed despite the fact that error can be BLK_STS_SUCCESS (in fact, I think this is likely the most common case since an command failure with hostbyte=DID_OK and driverbyte=DRIVER_SENSE is probably the most common one. My patch is a bit of a hammer and makes sure that an ACTION_FAIL request is completed as a failure... Am I getting all this wrong ? Best. -- Damien Le Moal, Western Digital