On 4/4/18 16:39, Damien Le Moal wrote: > Hannes, > > On 4/4/18 16:35, Hannes Reinecke wrote: >> On Wed, 4 Apr 2018 07:06:58 +0000 >> Damien Le Moal <Damien.LeMoal@xxxxxxx> wrote: >> >>> Hannes, >>> >>> On 4/4/18 15:57, Hannes Reinecke wrote: >>>> On Wed, 4 Apr 2018 15:51:38 +0900 >>>> Damien Le Moal <damien.lemoal@xxxxxxx> wrote: >>>> >>>>> With the introduction of commit e39a97353e53 ("scsi: core: return >>>>> BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"), a command >>>>> that failed with hostbyte=DID_OK and driverbyte=DRIVER_SENSE but >>>>> lacking additional sense information will have a return code set to >>>>> BLK_STS_OK. This results in the request issuer to see successful >>>>> request execution despite the failure. An example of such case is >>>>> an unaligned write on a host managed ZAC disk connected to a SAS >>>>> HBA with a malfunctioning SAT. The unaligned write command gets >>>>> aborted but has no additional sense information. >>>>> >>>>> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK >>>>> driverbyte=DRIVER_SENSE sd 10:0:0:0: [sde] tag#3905 Sense Key : >>>>> Aborted Command [current] sd 10:0:0:0: [sde] tag#3905 Add. Sense: >>>>> No additional sense information sd 10:0:0:0: [sde] tag#3905 CDB: >>>>> Write(16) 8a 00 00 00 00 00 02 0c 00 01 00 00 00 01 00 00 >>>>> print_req_error: I/O error, dev sde, sector 274726920 >>>>> >>>>> In scsi_io_completion(), sense key handling to not change the >>>>> request error code and success being reported to the issuer. >>>>> >>>>> Fix this by making sure that the error code always indicates an >>>>> error if scsi_io_completion() decide that the action to be taken >>>>> for a failed command is to not retry it and terminate it >>>>> immediately (ACTION_FAIL) . >>>>> >>>>> Signed-off-by: Damien Le Moal <damien.lemoal@xxxxxxx> >>>>> Fixes: e39a97353e53 ("scsi: core: return BLK_STS_OK for DID_OK in >>>>> __scsi_error_from_host_byte()") Cc: Hannes Reinecke <hare@xxxxxxxx> >>>>> Cc: <stable@xxxxxxxxxxxxxxx> >>>>> --- >>>>> drivers/scsi/scsi_lib.c | 9 +++++++++ >>>>> 1 file changed, 9 insertions(+) >>>>> >>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c >>>>> index c84f931388f2..87579bfcc186 100644 >>>>> --- a/drivers/scsi/scsi_lib.c >>>>> +++ b/drivers/scsi/scsi_lib.c >>>>> @@ -1002,6 +1002,15 @@ void scsi_io_completion(struct scsi_cmnd >>>>> *cmd, unsigned int good_bytes) scsi_print_command(cmd); >>>>> } >>>>> } >>>>> + /* >>>>> + * The command failed and should not be retried. >>>>> If the host >>>>> + * byte is DID_OK, then >>>>> __scsi_error_from_host_byte() returned >>>>> + * BLK_STS_OK and error indicates a success. Make >>>>> sure to not >>>>> + * use that as the completion code and always >>>>> return an >>>>> + * I/O error. >>>>> + */ >>>>> + if (error == BLK_STS_OK) >>>>> + error = BLK_STS_IOERR; >>>>> if (!scsi_end_request(req, error, >>>>> blk_rq_err_bytes(req), 0)) return; >>>>> /*FALLTHRU*/ >>>> >>>> That looks wrong. >>>> Shouldn't __scsi_error_from_host_byte() return the correct status >>>> here? >>> >>> My drive said: >>> >>> sd 10:0:0:0: [sde] tag#3905 FAILED Result: hostbyte=DID_OK >>> driverbyte=DRIVER_SENSE >>> sd 10:0:0:0: [sde] tag#3905 Sense Key : Aborted Command [current] >>> sd 10:0:0:0: [sde] tag#3905 Add. Sense: No additional sense >>> information sd 10:0:0:0: [sde] tag#3905 CDB: Write(16) 8a 00 00 00 00 >>> 00 02 0c 00 01 00 00 00 01 00 00 >>> >>> Since hostbyte is DID_OK, __scsi_error_from_host_byte() returns >>> BLK_STS_OK. The HBA fails to give sense data, so the ABORTED_COMMAND >>> case in scsi_io_completion() "switch (sshdr.sense_key)" does nothing >>> and error stays equal to success. scsi_end_request() gets called with >>> that and dd sees a success... >>> >>> There are also plenty of other sense keys cases where error is not >>> changed despite the fact that error can be BLK_STS_SUCCESS (in fact, I >>> think this is likely the most common case since an command failure >>> with hostbyte=DID_OK and driverbyte=DRIVER_SENSE is probably the most >>> common one. >>> >>> My patch is a bit of a hammer and makes sure that an ACTION_FAIL >>> request is completed as a failure... Am I getting all this wrong ? >>> >> >> Just checked with the code, and send a new patch. >> Which I find a bit clearer. >> >> Fact is that __scsi_error_from_host_byte() is not sufficient to >> evaluate the return code, so taking its return value with any further >> checks is indeed wrong. >> But then it never said so on the boilerplate of >> __scsi_error_from_host_byte() ... >> >> So please do check the 'v2' version of the patch. >> > > It fixes the particular problem I am seeing. > But looking more at this code, isn't there a problem with > sshdr.sense_key == RECOVERED_ERROR ? > > if (sense_valid && (sshdr.sense_key == RECOVERED_ERROR)) { > ... > result = 0; > /* for passthrough error may be set */ > error = BLK_STS_OK; > } > > Then the error will be changed wrongly to BLK_STS_IOERR. > > My original change had the same problem. Ignore. It does not go through the change in that case thanks to: if (!(blk_rq_bytes(req) == 0 && error) && !scsi_end_request(req, error, good_bytes, 0)) return; Assuming the RECOVERED_ERROR has no left over bytes. Otherwise, it goes through requeue. Both cases being before the error code overwrite, no problem. Thanks ! -- Damien Le Moal, Western Digital