Re: [Regression] Hang deleting ATA HDD device for undocking

Niklas Cassel <Niklas.Cassel@xxxxxxx> · Wed, 10 Jan 2024 20:49:18 +0000

Hello Bart,

On Tue, Jan 09, 2024 at 02:46:58PM -0800, Bart Van Assche wrote:
> On 1/9/24 13:45, Kevin Locke wrote:
> > On Tue, 2024-01-09 at 13:39 -0800, Bart Van Assche wrote:
> > > (cd /sys/class/scsi_host && grep -aH . */state)
> > > (cd /sys/class/scsi_device && grep -aH . */device/{device_{blocked,busy},state})
> > 
> > Sure thing.  Running the above commands produced the following output:
> > 
> > host0/state:running
> > host1/state:running
> > host2/state:running
> > host3/state:running
> > host4/state:running
> > host5/state:running
> > 0:0:0:0/device/device_blocked:0
> > 0:0:0:0/device/device_busy:1
> > 0:0:0:0/device/state:running
> 
> So the SCSI host state (host1) is fine but the information about the SCSI
> devices associated with host1 is missing, most likely because sysfs
> information of SCSI devices is removed before a SYNCHRONIZE CACHE command
> is submitted. The device_del(dev) call in __scsi_remove_device() happens
> after scsi_device_set_state(sdev, SDEV_CANCEL) so the SCSI device should
> be in the SDEV_CANCEL state. scsi_device_state_check() should translate
> SDEV_CANCEL into BLK_STS_OFFLINE.
> 
> There are several tests in the blktests suite that trigger SCSI device
> deletion, e.g. block/001. All blktests tests pass on my test setup.
> Additionally, I haven't seen any blktests failure reports recently that
> are related to device deletion. If I try to delete an ATA device in a VM,
> that works fine (kernel v6.7):
> 
> # dmesg -c >/dev/null
> # echo 1 > /sys/class/scsi_device/3:0:0:0/device/delete
> # dmesg -c
> [  215.533228] sd 3:0:0:0: [sdb] Synchronizing SCSI cache
> [  215.543932] ata3.00: Entering standby power mode
> 
> Running rescan-scsi-bus.sh -a brings this device back.
> 
> I'm not sure what I'm missing but I think that it's something ATA-specific.
> Since I'm not an ATA expert, I hope that an ATA expert can help. There are
> at least two ATA experts on the Cc-list of this email.

Thank you for your help debugging this.

It is easy to reproduce the hang in QEMU against a ATA device using:

# hdparm -Y /dev/sda
# echo 1 > /sys/class/scsi_device/0\:0\:0\:0/device/delete 


I can get it working again by simply reverting this part of your patch:

--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -163,8 +163,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, bool unbusy)
         */
        cmd->result = 0;
 
-       blk_mq_requeue_request(scsi_cmd_to_rq(cmd),
-                              !scsi_host_in_recovery(cmd->device->host));
+       blk_mq_requeue_request(scsi_cmd_to_rq(cmd), true);
 }


What goes on:
When libata receives a completion for a ATA_CMD_SLEEP command, it sets
flag ATA_DFLAG_SLEEPING, in order to know that the device is sleeping.

In order to take a device out of sleep a link reset is performed.

This link reset is currently performed very late in the call chain,
after a new SCSI command has been translated to an ATA command,
but before the ATA command is actually issued to the device.
(See ata_qc_issue(), which calls ata_link_abort() if flag
ATA_DFLAG_SLEEPING is set.)

While the ata_link_abort() will abort all QCs, using blk_abort_request(),
(regardless if the command was issued to the device or no were issued or not),
which will result in scsi_timeout() being called, which calls
scsi_eh_scmd_add() for all the commands.

scsi_eh_scmd_add() sets the SCSI host state to recovery, and then adds the
command to shost->eh_cmd_q.

However, after 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary"), it seems that after when libata EH strategy handler calls
scsi_eh_flush_done_q(), the failed commands get requeued
(scsi_eh_flush_done_q() calls scsi_queue_insert()), but the queue is never
kicked after EH is done, so the command is never sent down to the device.

Most likely because scsi_eh_scmd_add(), which sets the SCSI host state
to recovery, together with your modification to __scsi_queue_insert(),
causes the queue to never get kicked.

I have a patch that works around this issue by instead deferring the
SYNCHRONIZE CACHE command, and letting the block layer requeue the
command (see patch attached at the end of this email), instead of
relying on SCSI EH to requeue the command.

However, I'm worried that applying that libata patch will simply hide
an actual problem in SCSI, which might lead to someone else stumbling
on this SCSI bug in the future.

Thoughts?


Kind regards,
Niklas

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 09ed67772fae..fd4da14eefc6 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5038,14 +5038,6 @@ void ata_qc_issue(struct ata_queued_cmd *qc)
                if (ata_sg_setup(qc))
                        goto sys_err;
 
-       /* if device is sleeping, schedule reset and abort the link */
-       if (unlikely(qc->dev->flags & ATA_DFLAG_SLEEPING)) {
-               link->eh_info.action |= ATA_EH_RESET;
-               ata_ehi_push_desc(&link->eh_info, "waking up from sleep");
-               ata_link_abort(link);
-               return;
-       }
-
        trace_ata_qc_prep(qc);
        qc->err_mask |= ap->ops->qc_prep(qc);
        if (unlikely(qc->err_mask))
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 0a0f483124c3..f7cbe2d9141d 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1682,10 +1682,20 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
 static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
                              ata_xlat_func_t xlat_func)
 {
-       struct ata_port *ap = dev->link->ap;
+       struct ata_link *link = dev->link;
+       struct ata_port *ap = link->ap;
        struct ata_queued_cmd *qc;
        int rc;
 
+       /* if device is sleeping, defer, schedule reset and abort the link */
+       if (unlikely(dev->flags & ATA_DFLAG_SLEEPING)) {
+               link->eh_info.action |= ATA_EH_RESET;
+               ata_ehi_push_desc(&link->eh_info, "waking up from sleep");
+               ata_link_abort(link);
+               rc = ATA_DEFER_LINK;
+               goto defer_nofree;
+       }
+
        qc = ata_scsi_qc_new(dev, cmd);
        if (!qc)
                goto err_mem;
@@ -1732,6 +1742,7 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
 
 defer:
        ata_qc_free(qc);
+defer_nofree:
        if (rc == ATA_DEFER_LINK)
                return SCSI_MLQUEUE_DEVICE_BUSY;
        else