On Fri, Sep 06, 2024 at 04:27:21PM +0200, Niklas Cassel wrote: > When ata_qc_complete() schedules a command for EH using > ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to > req->q->mq_ops->timeout() / scsi_timeout() being called. > > scsi_timeout(), if the LLDD has no abort handler (libata has no abort > handler), will set host byte to DID_TIME_OUT, and then call > scsi_eh_scmd_add() to add the command to EH. > > Thus, when commands first enter libata's EH strategy_handler, all the > commands that have been added to EH will have DID_TIME_OUT set. > > libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that > have not received a completion at the time of entering EH. > > Thus, we don't really care about DID_TIME_OUT at all, and currently clear > the host byte at the end of EH, in ata_scsi_qc_complete(), before > scsi_eh_finish_cmd() is called. > > ata_scsi_qc_complete() will be called both for commands that are completed > normally (without going via EH), and for commands that went via EH. > > It seems more appropriate to clear DID_TIME_OUT at the start of EH instead > of at the end of EH. That way, someone dumping the host byte at the middle > of EH will not see DID_TIME_OUT as being set. No functional change > intended. > > This has the additional advantage that we will not needlessly clear the > host byte for commands that did not go via EH. > > Signed-off-by: Niklas Cassel <cassel@xxxxxxxxxx> Thank you for coming up with this generic solution, Niklas! I tested the patch with sg_sat_read_gplog. Without the fix we can see that DID_TIME_OUT is being reported to user space. The issue is gone with the fix applied. Without the fix: ---------------- $ sg_sat_read_gplog -a 30 -p 9 --dma -vv /dev/sda open /dev/sda with flags=0x802 Building ATA READ LOG DMA EXT command; la=0x1e, pn=0x9, this_count=1 ATA pass-through(16) cdb: [85 0d 0e 00 00 00 01 00 1e 00 09 00 00 00 47 00] ATA pass-through(16): transport error: Host_status=0x03 [DID_TIME_OUT] Driver_status=0x08 [DRIVER_SENSE] ^^^^^^^^^^^^ ATA pass-through(16): Fixed format, current; Sense key: Illegal Request ASC=24, ASCQ=00 (hex) Info fld=0x4530001 [72548353] Raw sense data (in hex), sb_len=64 f0 00 05 04 53 00 01 0a 80 1e 09 00 24 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ATA PASS-THROUGH (16) failed $ echo $? 99 With the fix: ------------- $ sg_sat_read_gplog -a 30 -p 9 --dma -vv /dev/sda open /dev/sda with flags=0x802 Building ATA READ LOG DMA EXT command; la=0x1e, pn=0x9, this_count=1 ATA pass-through(16) cdb: [85 0d 0e 00 00 00 01 00 1e 00 09 00 00 00 47 00] ATA pass through: Fixed format, current; Sense key: Illegal Request ASC=24, ASCQ=00 (hex) Info fld=0x4530001 [72548353] $ echo $? 5 Tested-by: Igor Pylypiv <ipylypiv@xxxxxxxxxx> Thanks, Igor > --- > drivers/ata/libata-eh.c | 9 +++++++++ > drivers/ata/libata-scsi.c | 3 --- > 2 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c > index 7de97ee8e78b..450e9bd96c97 100644 > --- a/drivers/ata/libata-eh.c > +++ b/drivers/ata/libata-eh.c > @@ -630,6 +630,15 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap, > list_for_each_entry_safe(scmd, tmp, eh_work_q, eh_entry) { > struct ata_queued_cmd *qc; > > + /* > + * If the scmd was added to EH, via ata_qc_schedule_eh() -> > + * scsi_timeout() -> scsi_eh_scmd_add(), scsi_timeout() will > + * have set DID_TIME_OUT (since libata does not have an abort > + * handler). Thus to clear the DID_TIME_OUT, we clear the host > + * byte (but keep the SCSI ML and status byte). > + */ > + scmd->result &= 0x0000ffff; > + > ata_qc_for_each_raw(ap, qc, i) { > if (qc->flags & ATA_QCFLAG_ACTIVE && > qc->scsicmd == scmd) > diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c > index 3a442f564b0d..6a90062c8b55 100644 > --- a/drivers/ata/libata-scsi.c > +++ b/drivers/ata/libata-scsi.c > @@ -1680,9 +1680,6 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc) > set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION); > } else if (is_error && !have_sense) { > ata_gen_ata_sense(qc); > - } else { > - /* Keep the SCSI ML and status byte, clear host byte. */ > - cmd->result &= 0x0000ffff; > } > > ata_qc_done(qc); > -- > 2.46.0 >