On 9/9/24 22:08, Niklas Cassel wrote: > On Mon, Sep 09, 2024 at 09:52:53PM +0900, Damien Le Moal wrote: >> On 9/9/24 17:47, Niklas Cassel wrote: >>> When ata_qc_complete() schedules a command for EH using >>> ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to >>> req->q->mq_ops->timeout() / scsi_timeout() being called. >>> >>> scsi_timeout(), if the LLDD has no abort handler (libata has no abort >>> handler), will set host byte to DID_TIME_OUT, and then call >>> scsi_eh_scmd_add() to add the command to EH. >>> >>> Thus, when commands first enter libata's EH strategy_handler, all the >>> commands that have been added to EH will have DID_TIME_OUT set. >>> >>> libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that >>> have not received a completion at the time of entering EH. >>> >>> Thus, libata doesn't really care about DID_TIME_OUT at all, and currently >>> clears the host byte at the end of EH, in ata_scsi_qc_complete(), before >>> scsi_eh_finish_cmd() is called. >>> >>> However, this clearing in ata_scsi_qc_complete() is currently only done >>> for commands that are not ATA passthrough commands. >>> >>> Since the host byte is visible in the completion that we return to user >>> space for ATA passthrough commands, for ATA passthrough commands that got >>> completed via EH (commands with sense data), the user will incorrectly see: >>> ATA pass-through(16): transport error: Host_status=0x03 [DID_TIME_OUT] >>> >>> Fix this by moving the clearing of the host byte (which is currently only >>> done for commands that are not ATA passthrough commands) from >>> ata_scsi_qc_complete() to the start of EH (regardless if the command is >>> ATA passthrough or not). >>> >>> This will make sure that we: >>> -Correctly clear DID_TIME_OUT for both ATA passthrough commands and >>> commands that are not ATA passthrough commands. >>> -Do not needlessly clear the host byte for commands that did not go via EH. >>> ata_scsi_qc_complete() is called both for commands that are completed >>> normally (without going via EH), and for commands that went via EH, >>> however, only commands that went via EH will have DID_TIME_OUT set. >>> >>> Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION") >>> Reported-by: Igor Pylypiv <ipylypiv@xxxxxxxxxx> >>> Closes: https://lore.kernel.org/linux-ide/ZttIN8He8TOZ7Lct@xxxxxxxxxx/ >>> Tested-by: Igor Pylypiv <ipylypiv@xxxxxxxxxx> >>> Signed-off-by: Niklas Cassel <cassel@xxxxxxxxxx> >>> --- >>> Changes since v1: >>> -Picked up tags from Igor. >>> -Added Fixes tag. >>> -Improved the commit message to clearly state that this is currently a >>> real bug for ATA PT commands with sense data. >>> >>> drivers/ata/libata-eh.c | 9 +++++++++ >>> drivers/ata/libata-scsi.c | 3 --- >>> 2 files changed, 9 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c >>> index 7de97ee8e78b..450e9bd96c97 100644 >>> --- a/drivers/ata/libata-eh.c >>> +++ b/drivers/ata/libata-eh.c >>> @@ -630,6 +630,15 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap, >>> list_for_each_entry_safe(scmd, tmp, eh_work_q, eh_entry) { >>> struct ata_queued_cmd *qc; >>> >>> + /* >>> + * If the scmd was added to EH, via ata_qc_schedule_eh() -> >>> + * scsi_timeout() -> scsi_eh_scmd_add(), scsi_timeout() will >>> + * have set DID_TIME_OUT (since libata does not have an abort >>> + * handler). Thus to clear the DID_TIME_OUT, we clear the host >>> + * byte (but keep the SCSI ML and status byte). >>> + */ >>> + scmd->result &= 0x0000ffff; >> >> I know it was like that, but why not: >> >> set_host_byte(scmd, 0); >> or >> set_host_byte(scmd, DID_OK); >> >> ? > > No particular reason. Since we basically just moving the code, > it made sense to keep it similar to the original code, but I > can submit a v3 that instead does: > set_host_byte(scmd, DID_OK); > > if you prefer that. I do prefer it. The magic 0x0000ffff mask is not exactly clear (the comment helps though)... Side note: patching scsi to define macros for status, ML and host byte masks and shifts would be very nice :) > > Strictly speaking, that would probably require us to drop Igor's > Tested-by though (even if the generated code for an optimizing > compiler ought to generate the same code). > > >> >> set_host_byte() uses the mask 0xff00ffff, since the upper 8 bits seem to be >> ignored: bits [0..7] are the status byte, [16..23] are the host byte and bits >> [8..15] are the message byte but that is unused. > > Nit: 8..16 is the SCSI midlayer byte, not message byte, see > 36ebf1e2aa14 ("scsi: core: Add error codes for internal SCSI midlayer use") > > > Kind regards, > Niklas -- Damien Le Moal Western Digital Research