Re: [PATCH v2] ata: libata-core: do not issue non-internal commands once EH is pending

John Garry <john.g.garry@xxxxxxxxxx> · Mon, 7 Nov 2022 17:01:53 +0000

On 07/11/2022 16:10, Niklas Cassel wrote:
scsi_queue_rq() will check if scsi_host_in_recovery() (state is
SHOST_RECOVERY), and if so, it will _not_ issue a command via:
scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
-> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()

Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
when receiving a TFES error IRQ, this was the call chain:
ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)

Which meant that as soon as the error IRQ was serviced, no additional
commands sent from the block layer (scsi_queue_rq()) would be sent down
to the device. (ATA internal commands originating for ATA EH could of
course still be sent.)

However, after commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
scsi_times_out() would instead result in a call to scsi_abort_command() ->
queue_delayed_work(). work function: scmd_eh_abort_handler() would call
scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).

(It was possible to get the old behavior if host->hostt->no_async_abort
was set, but libata never used it, and this option was completely removed
in commit a06586325f37 ("scsi: make asynchronous aborts mandatory"))

Additionally, later in commit 358f70da49d7 ("blk-mq: make
blk_abort_request() trigger timeout path"), blk_abort_request() was changed
to also call the abort callback asynchronously.

So now, after the TFES error irq has been serviced, we need to wait for
two different workqueues to run their work, before the SHOST_RECOVERY
state gets set.

While the ATA specification states that a device should return command
aborted for all commands queued after the device has entered error state,
since ATA only keeps the sense data for the latest command (in non-NCQ
case), we really don't want to send block layer commands to the device
after it has entered error state. (Only ATA EH commands should be sent,
to read the sense data etc.)

While we could just call scsi_host_set_state(shost, SHOST_RECOVERY) from
ata_qc_schedule_eh() directly, that might be a layering violation.
So instead of doing that, add an additional check against the libata's own
EH flag(s) before calling the qc_defer callback.

Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx>
---
Changes since v1:
-Implemented review comments from Damien.

  drivers/ata/libata-scsi.c | 20 ++++++++++++++++++++
  1 file changed, 20 insertions(+)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 4cb914103382..383a208f5f99 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
  	if (xlat_func(qc))
  		goto early_finish;
  
+	/*
+	 * scsi_queue_rq() will defer commands when in state SHOST_RECOVERY.
+	 *
+	 * When getting an error interrupt, ata_port_abort() will be called,
+	 * which ends up calling ata_qc_schedule_eh() on all QCs.
+	 *
+	 * ata_qc_schedule_eh() will call ata_eh_set_pending() and then call
+	 * blk_abort_request() on the given QC. blk_abort_request() will
+	 * asynchronously end up calling scsi_eh_scmd_add(), which will set
+	 * the state to SHOST_RECOVERY and wake up SCSI EH.
+	 *
+	 * In order to avoid requests from being issued to the device from the
+	 * time ata_eh_set_pending() is called, to the time scsi_eh_scmd_add()
+	 * sets the state to SHOST_RECOVERY, we defer requests here as well.
+	 */
+	if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
+		rc = ATA_DEFER_LINK;
+		goto defer;

Could we move this check earlier? I mean, we didn't need to have the qc 
alloc'ed and xlat'ed for this check to be done, right?

+	}
+
  	if (ap->ops->qc_defer) {
  		if ((rc = ap->ops->qc_defer(qc)))
  			goto defer;

This solves the issue I saw in a QEMU AHCI NCQ error model where 
ahci_error_intr() -> ata_port_abort() is called multiple times, so, FWIW:

Tested-by: John Garry <john.g.garry@xxxxxxxxxx>

Thanks