On 11/8/22 00:57, Niklas Cassel wrote:
On Mon, Nov 07, 2022 at 05:01:53PM +0000, John Garry wrote:
On 07/11/2022 16:10, Niklas Cassel wrote:
scsi_queue_rq() will check if scsi_host_in_recovery() (state is
SHOST_RECOVERY), and if so, it will _not_ issue a command via:
scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
-> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
when receiving a TFES error IRQ, this was the call chain:
ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
Which meant that as soon as the error IRQ was serviced, no additional
commands sent from the block layer (scsi_queue_rq()) would be sent down
to the device. (ATA internal commands originating for ATA EH could of
course still be sent.)
However, after commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
scsi_times_out() would instead result in a call to scsi_abort_command() ->
queue_delayed_work(). work function: scmd_eh_abort_handler() would call
scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).
(It was possible to get the old behavior if host->hostt->no_async_abort
was set, but libata never used it, and this option was completely removed
in commit a06586325f37 ("scsi: make asynchronous aborts mandatory"))
Additionally, later in commit 358f70da49d7 ("blk-mq: make
blk_abort_request() trigger timeout path"), blk_abort_request() was changed
to also call the abort callback asynchronously.
So now, after the TFES error irq has been serviced, we need to wait for
two different workqueues to run their work, before the SHOST_RECOVERY
state gets set.
While the ATA specification states that a device should return command
aborted for all commands queued after the device has entered error state,
since ATA only keeps the sense data for the latest command (in non-NCQ
case), we really don't want to send block layer commands to the device
after it has entered error state. (Only ATA EH commands should be sent,
to read the sense data etc.)
While we could just call scsi_host_set_state(shost, SHOST_RECOVERY) from
ata_qc_schedule_eh() directly, that might be a layering violation.
So instead of doing that, add an additional check against the libata's own
EH flag(s) before calling the qc_defer callback.
Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx>
---
Changes since v1:
-Implemented review comments from Damien.
drivers/ata/libata-scsi.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 4cb914103382..383a208f5f99 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
if (xlat_func(qc))
goto early_finish;
+ /*
+ * scsi_queue_rq() will defer commands when in state SHOST_RECOVERY.
+ *
+ * When getting an error interrupt, ata_port_abort() will be called,
+ * which ends up calling ata_qc_schedule_eh() on all QCs.
+ *
+ * ata_qc_schedule_eh() will call ata_eh_set_pending() and then call
+ * blk_abort_request() on the given QC. blk_abort_request() will
+ * asynchronously end up calling scsi_eh_scmd_add(), which will set
+ * the state to SHOST_RECOVERY and wake up SCSI EH.
+ *
+ * In order to avoid requests from being issued to the device from the
+ * time ata_eh_set_pending() is called, to the time scsi_eh_scmd_add()
+ * sets the state to SHOST_RECOVERY, we defer requests here as well.
+ */
+ if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
+ rc = ATA_DEFER_LINK;
+ goto defer;
Could we move this check earlier? I mean, we didn't need to have the qc
alloc'ed and xlat'ed for this check to be done, right?
Sure, we could put it in e.g. ata_scsi_queuecmd() or __ata_scsi_queuecmd().
Or, perhaps it is just time to accept that ATA EH is so interconnected with
SCSI EH, so the simplest thing is just to do:
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
qc->flags |= ATA_QCFLAG_FAILED;
ata_eh_set_pending(ap, 1);
+ scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);
/* The following will fail if timeout has already expired.
* ata_scsi_error() takes care of such scmds on EH entry.
Which appears to work just as well as the patch in $subject.
In commit ee7863bc68fa ("[PATCH] SCSI: implement shost->host_eh_scheduled")
Tejun mentioned that "... libata is planning to depart from SCSI, so, for the
time being, libata will be using SCSI EH to handle such exceptions."
Now, 16 years later, ATA is still using SCSI EH (see ata_port_schedule_eh() and
ata_std_sched_eh()) to schedule EH in the case when there are no QCs to abort:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-eh.c?h=v6.1-rc4#n1004
Damien, thoughts?
Yeah, that was the original idea. By the time libata was designed we
still had quite a lot of drivers in drivers/ide, and it wasn't quite
clear if managing everything via SATL was the way to go.
So one idea was to allow libata to become stand-alone, and SATL working
as a modular interface on top of libata.
However, this never materialized, as everyone was quite happy having to
deal with SCSI devices only, even if that meant to go via an additional
layer of interpolation.
With the introduction of libsas this idea became even more dubious, as
libsas really called for a _tighter_ integration with SCSI.
Plus ATA in general seems to be heading into the sunset, so I sincerely
doubt anyone really wants to head that way.
In fact, with the recent patches we should finally drop the pretension
that libata will ever be standalone, and concentrate on making it more
aligned with SCSI.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman