Re: [PATCH v2] ata: libata-core: do not issue non-internal commands once EH is pending

Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> · Tue, 8 Nov 2022 12:50:44 +0900

+ linux-scsi, Martin and James

On 11/8/22 08:57, Niklas Cassel wrote:
> On Mon, Nov 07, 2022 at 05:01:53PM +0000, John Garry wrote:
>> On 07/11/2022 16:10, Niklas Cassel wrote:
>>> scsi_queue_rq() will check if scsi_host_in_recovery() (state is
>>> SHOST_RECOVERY), and if so, it will _not_ issue a command via:
>>> scsi_dispatch_cmd() -> host->hostt->queuecommand() (ata_scsi_queuecmd())
>>> -> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
>>>
>>> Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
>>> when receiving a TFES error IRQ, this was the call chain:
>>> ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
>>> ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
>>> blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
>>> scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
>>>
>>> Which meant that as soon as the error IRQ was serviced, no additional
>>> commands sent from the block layer (scsi_queue_rq()) would be sent down
>>> to the device. (ATA internal commands originating for ATA EH could of
>>> course still be sent.)
>>>
>>> However, after commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
>>> scsi_times_out() would instead result in a call to scsi_abort_command() ->
>>> queue_delayed_work(). work function: scmd_eh_abort_handler() would call
>>> scsi_eh_scmd_add(), which calls scsi_host_set_state(shost, SHOST_RECOVERY).
>>>
>>> (It was possible to get the old behavior if host->hostt->no_async_abort
>>> was set, but libata never used it, and this option was completely removed
>>> in commit a06586325f37 ("scsi: make asynchronous aborts mandatory"))
>>>
>>> Additionally, later in commit 358f70da49d7 ("blk-mq: make
>>> blk_abort_request() trigger timeout path"), blk_abort_request() was changed
>>> to also call the abort callback asynchronously.
>>>
>>> So now, after the TFES error irq has been serviced, we need to wait for
>>> two different workqueues to run their work, before the SHOST_RECOVERY
>>> state gets set.
>>>
>>> While the ATA specification states that a device should return command
>>> aborted for all commands queued after the device has entered error state,
>>> since ATA only keeps the sense data for the latest command (in non-NCQ
>>> case), we really don't want to send block layer commands to the device
>>> after it has entered error state. (Only ATA EH commands should be sent,
>>> to read the sense data etc.)
>>>
>>> While we could just call scsi_host_set_state(shost, SHOST_RECOVERY) from
>>> ata_qc_schedule_eh() directly, that might be a layering violation.
>>> So instead of doing that, add an additional check against the libata's own
>>> EH flag(s) before calling the qc_defer callback.
>>>
>>> Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
>>> Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx>
>>> ---
>>> Changes since v1:
>>> -Implemented review comments from Damien.
>>>
>>>   drivers/ata/libata-scsi.c | 20 ++++++++++++++++++++
>>>   1 file changed, 20 insertions(+)
>>>
>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>> index 4cb914103382..383a208f5f99 100644
>>> --- a/drivers/ata/libata-scsi.c
>>> +++ b/drivers/ata/libata-scsi.c
>>> @@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
>>>   	if (xlat_func(qc))
>>>   		goto early_finish;
>>> +	/*
>>> +	 * scsi_queue_rq() will defer commands when in state SHOST_RECOVERY.
>>> +	 *
>>> +	 * When getting an error interrupt, ata_port_abort() will be called,
>>> +	 * which ends up calling ata_qc_schedule_eh() on all QCs.
>>> +	 *
>>> +	 * ata_qc_schedule_eh() will call ata_eh_set_pending() and then call
>>> +	 * blk_abort_request() on the given QC. blk_abort_request() will
>>> +	 * asynchronously end up calling scsi_eh_scmd_add(), which will set
>>> +	 * the state to SHOST_RECOVERY and wake up SCSI EH.
>>> +	 *
>>> +	 * In order to avoid requests from being issued to the device from the
>>> +	 * time ata_eh_set_pending() is called, to the time scsi_eh_scmd_add()
>>> +	 * sets the state to SHOST_RECOVERY, we defer requests here as well.
>>> +	 */
>>> +	if (ap->pflags & (ATA_PFLAG_EH_PENDING | ATA_PFLAG_EH_IN_PROGRESS)) {
>>> +		rc = ATA_DEFER_LINK;
>>> +		goto defer;
>>
>> Could we move this check earlier? I mean, we didn't need to have the qc
>> alloc'ed and xlat'ed for this check to be done, right?
> 
> Sure, we could put it in e.g. ata_scsi_queuecmd() or __ata_scsi_queuecmd().
> 
> 
> Or, perhaps it is just time to accept that ATA EH is so interconnected with
> SCSI EH, so the simplest thing is just to do:
> 
> --- a/drivers/ata/libata-eh.c
> +++ b/drivers/ata/libata-eh.c
> @@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
>  
>         qc->flags |= ATA_QCFLAG_FAILED;
>         ata_eh_set_pending(ap, 1);
> +       scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);

Why put this in this function ? Nothing in ata_qc_schedule_eh() calls
scsi_schedule_eh() or scsi_eh_scmd_add(), which set that state.
It looks like ata_port->ops->sched_eh(), that is, ata_std_sched_eh(),
would be a better place for this.

>  
>         /* The following will fail if timeout has already expired.
>          * ata_scsi_error() takes care of such scmds on EH entry.
> 
> 
> Which appears to work just as well as the patch in $subject.
> 
> In commit ee7863bc68fa ("[PATCH] SCSI: implement shost->host_eh_scheduled")
> Tejun mentioned that "... libata is planning to depart from SCSI, so, for the
> time being, libata will be using SCSI EH to handle such exceptions."
> 
> Now, 16 years later, ATA is still using SCSI EH (see ata_port_schedule_eh() and
> ata_std_sched_eh()) to schedule EH in the case when there are no QCs to abort:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-eh.c?h=v6.1-rc4#n1004
> 
> Damien, thoughts?

I like this simpler fix better, but it does introduce a risk of (again)
having problems with ata EH if scsi EH trigger/timing is changed.
Unlikely now, but as this fix proves, not unheard of.

The v2 change is fine too, modulo John's suggestion, which I agree with.
At least it is consistent with the ata internal eh state accounting, so
self contained within libata and somewhat less dependent on scsi state
machine.

Martin ? James ? Thoughts ?

-- 
Damien Le Moal
Western Digital Research