Re: [PATCH v2] ata: libata-core: do not issue non-internal commands once EH is pending

Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> · Tue, 8 Nov 2022 16:13:36 +0900

On 11/8/22 16:10, Hannes Reinecke wrote:
> On 11/8/22 00:57, Niklas Cassel wrote:
>> On Mon, Nov 07, 2022 at 05:01:53PM +0000, John Garry wrote:
>>> On 07/11/2022 16:10, Niklas Cassel wrote:
>>>> scsi_queue_rq() will check if scsi_host_in_recovery() (state is
>>>> SHOST_RECOVERY), and if so, it will _not_ issue a command via:
>>>> scsi_dispatch_cmd() -> host->hostt->queuecommand()
>>>> (ata_scsi_queuecmd())
>>>> -> __ata_scsi_queuecmd() -> ata_scsi_translate() -> ata_qc_issue()
>>>>
>>>> Before commit e494f6a72839 ("[SCSI] improved eh timeout handler"),
>>>> when receiving a TFES error IRQ, this was the call chain:
>>>> ahci_error_intr() -> ata_port_abort() -> ata_do_link_abort() ->
>>>> ata_qc_complete() -> ata_qc_schedule_eh() -> blk_abort_request() ->
>>>> blk_rq_timed_out() -> q->rq_timed_out_fn() (scsi_times_out()) ->
>>>> scsi_eh_scmd_add() -> scsi_host_set_state(shost, SHOST_RECOVERY)
>>>>
>>>> Which meant that as soon as the error IRQ was serviced, no additional
>>>> commands sent from the block layer (scsi_queue_rq()) would be sent down
>>>> to the device. (ATA internal commands originating for ATA EH could of
>>>> course still be sent.)
>>>>
>>>> However, after commit e494f6a72839 ("[SCSI] improved eh timeout
>>>> handler"),
>>>> scsi_times_out() would instead result in a call to
>>>> scsi_abort_command() ->
>>>> queue_delayed_work(). work function: scmd_eh_abort_handler() would call
>>>> scsi_eh_scmd_add(), which calls scsi_host_set_state(shost,
>>>> SHOST_RECOVERY).
>>>>
>>>> (It was possible to get the old behavior if host->hostt->no_async_abort
>>>> was set, but libata never used it, and this option was completely
>>>> removed
>>>> in commit a06586325f37 ("scsi: make asynchronous aborts mandatory"))
>>>>
>>>> Additionally, later in commit 358f70da49d7 ("blk-mq: make
>>>> blk_abort_request() trigger timeout path"), blk_abort_request() was
>>>> changed
>>>> to also call the abort callback asynchronously.
>>>>
>>>> So now, after the TFES error irq has been serviced, we need to wait for
>>>> two different workqueues to run their work, before the SHOST_RECOVERY
>>>> state gets set.
>>>>
>>>> While the ATA specification states that a device should return command
>>>> aborted for all commands queued after the device has entered error
>>>> state,
>>>> since ATA only keeps the sense data for the latest command (in non-NCQ
>>>> case), we really don't want to send block layer commands to the device
>>>> after it has entered error state. (Only ATA EH commands should be sent,
>>>> to read the sense data etc.)
>>>>
>>>> While we could just call scsi_host_set_state(shost, SHOST_RECOVERY)
>>>> from
>>>> ata_qc_schedule_eh() directly, that might be a layering violation.
>>>> So instead of doing that, add an additional check against the
>>>> libata's own
>>>> EH flag(s) before calling the qc_defer callback.
>>>>
>>>> Fixes: e494f6a72839 ("[SCSI] improved eh timeout handler")
>>>> Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx>
>>>> ---
>>>> Changes since v1:
>>>> -Implemented review comments from Damien.
>>>>
>>>>    drivers/ata/libata-scsi.c | 20 ++++++++++++++++++++
>>>>    1 file changed, 20 insertions(+)
>>>>
>>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>>> index 4cb914103382..383a208f5f99 100644
>>>> --- a/drivers/ata/libata-scsi.c
>>>> +++ b/drivers/ata/libata-scsi.c
>>>> @@ -1736,6 +1736,26 @@ static int ata_scsi_translate(struct
>>>> ata_device *dev, struct scsi_cmnd *cmd,
>>>>        if (xlat_func(qc))
>>>>            goto early_finish;
>>>> +    /*
>>>> +     * scsi_queue_rq() will defer commands when in state
>>>> SHOST_RECOVERY.
>>>> +     *
>>>> +     * When getting an error interrupt, ata_port_abort() will be
>>>> called,
>>>> +     * which ends up calling ata_qc_schedule_eh() on all QCs.
>>>> +     *
>>>> +     * ata_qc_schedule_eh() will call ata_eh_set_pending() and then
>>>> call
>>>> +     * blk_abort_request() on the given QC. blk_abort_request() will
>>>> +     * asynchronously end up calling scsi_eh_scmd_add(), which will
>>>> set
>>>> +     * the state to SHOST_RECOVERY and wake up SCSI EH.
>>>> +     *
>>>> +     * In order to avoid requests from being issued to the device
>>>> from the
>>>> +     * time ata_eh_set_pending() is called, to the time
>>>> scsi_eh_scmd_add()
>>>> +     * sets the state to SHOST_RECOVERY, we defer requests here as
>>>> well.
>>>> +     */
>>>> +    if (ap->pflags & (ATA_PFLAG_EH_PENDING |
>>>> ATA_PFLAG_EH_IN_PROGRESS)) {
>>>> +        rc = ATA_DEFER_LINK;
>>>> +        goto defer;
>>>
>>> Could we move this check earlier? I mean, we didn't need to have the qc
>>> alloc'ed and xlat'ed for this check to be done, right?
>>
>> Sure, we could put it in e.g. ata_scsi_queuecmd() or
>> __ata_scsi_queuecmd().
>>
>>
>> Or, perhaps it is just time to accept that ATA EH is so interconnected
>> with
>> SCSI EH, so the simplest thing is just to do:
>>
>> --- a/drivers/ata/libata-eh.c
>> +++ b/drivers/ata/libata-eh.c
>> @@ -913,6 +913,7 @@ void ata_qc_schedule_eh(struct ata_queued_cmd *qc)
>>            qc->flags |= ATA_QCFLAG_FAILED;
>>          ata_eh_set_pending(ap, 1);
>> +       scsi_host_set_state(ap->scsi_host, SHOST_RECOVERY);
>>            /* The following will fail if timeout has already expired.
>>           * ata_scsi_error() takes care of such scmds on EH entry.
>>
>>
>> Which appears to work just as well as the patch in $subject.
>>
>> In commit ee7863bc68fa ("[PATCH] SCSI: implement
>> shost->host_eh_scheduled")
>> Tejun mentioned that "... libata is planning to depart from SCSI, so,
>> for the
>> time being, libata will be using SCSI EH to handle such exceptions."
>>
>> Now, 16 years later, ATA is still using SCSI EH (see
>> ata_port_schedule_eh() and
>> ata_std_sched_eh()) to schedule EH in the case when there are no QCs
>> to abort:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-eh.c?h=v6.1-rc4#n1004
>>
>> Damien, thoughts?
> 
> Yeah, that was the original idea. By the time libata was designed we
> still had quite a lot of drivers in drivers/ide, and it wasn't quite
> clear if managing everything via SATL was the way to go.
> So one idea was to allow libata to become stand-alone, and SATL working
> as a modular interface on top of libata.
> However, this never materialized, as everyone was quite happy having to
> deal with SCSI devices only, even if that meant to go via an additional
> layer of interpolation.
> 
> With the introduction of libsas this idea became even more dubious, as
> libsas really called for a _tighter_ integration with SCSI.
> Plus ATA in general seems to be heading into the sunset, so I sincerely
> doubt anyone really wants to head that way.
> 
> In fact, with the recent patches we should finally drop the pretension
> that libata will ever be standalone, and concentrate on making it more
> aligned with SCSI.

So you are saying you prefer the fix proposed by Niklas above, calling
scsi_host_set_state() from libata ? I feel OK-ish about it... But if it
is called in the right place, why not.

> 
> Cheers,
> 
> Hannes

-- 
Damien Le Moal
Western Digital Research