On 28/03/2023 12:15, Jason Yan wrote:
When a disk is removed with inflight IO, the userspace application need
to wait for 30 senconds(depends on the timeout configuration) to getback
from the kernel. Xingui tried to fix this issue by aborting the ata link
for SATA devices[1]. However this approach left the SAS devices unresolved.
This patch try to fix this issue by aborting all inflight requests while
the device is gone. This is implemented by itering the tagset.
[1] https://lore.kernel.org/lkml/234e04db-7539-07e4-a6b8-c6b05f78193d@xxxxxxxxxxxxxxxxxx/T/
Cc: Xingui Yang <yangxingui@xxxxxxxxxx>
Cc: John Garry <john.g.garry@xxxxxxxxxx>
Cc: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx>
Cc: Hannes Reinecke <hare@xxxxxxxx>
Signed-off-by: Jason Yan <yanaijie@xxxxxxxxxx>
---
drivers/scsi/libsas/sas_discover.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 72fdb2e5d047..d2be67f348e0 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -360,8 +360,28 @@ static void sas_destruct_ports(struct asd_sas_port *port)
}
}
+static bool sas_abort_cmd(struct request *req, void *data)
+{
+ struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
+ struct domain_device *dev = data;
+
+ if (dev == cmd_to_domain_dev(cmd))
+ blk_abort_request(req);
I suppose that this is ok, but we're not dealing with libsas "internal"
commands or libata internal commands, though. What about them?
I suppose my series here would help:
https://lore.kernel.org/linux-scsi/1666693096-180008-1-git-send-email-john.garry@xxxxxxxxxx/
Along with Part II
+ return true;
+}
+
+static void sas_abort_domain_cmds(struct domain_device *dev)
+{
+ struct sas_ha_struct *sas_ha = dev->port->ha;
+ struct Scsi_Host *shost = sas_ha->core.shost;
+ blk_mq_tagset_busy_iter(&shost->tag_set, sas_abort_cmd, dev);
blk_mq_queue_tag_busy_iter() would be nicer to use here, but it's not
exported - I am not advocating exporting it either. And we don't have
direct access to the scsi device pointer (from which we can look up the
request queue pointer), either.
+}
+
void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
{
+ if (test_bit(SAS_DEV_GONE, &dev->state))
+ sas_abort_domain_cmds(dev);
This code is common to expanders. Should we really be calling this for
an expander, even if it is harmless (as it does nothing currently)?
And we also seem to be calling this for rphy "which never saw
sas_rphy_add" (see code comment, not shown below), which is
questionable. Should we really do that?
+
if (!test_bit(SAS_DEV_DESTROY, &dev->state) &&
!list_empty(&dev->disco_list_node)) {
/* this rphy never saw sas_rphy_add */