On 10/1/20 4:44 AM, Bart Van Assche wrote:
On 2020-09-30 01:02, Hannes Reinecke wrote:
When the ALUA state indicates transitioning we should not retry
the command immediately, but rather complete the command with
BLK_STS_AGAIN to signal the completion handler that it might
be retried.
This allows multipathing to redirect the command to another path
if possible, and avoid stalls during lengthy transitioning times.
Signed-off-by: Hannes Reinecke <hare@xxxxxxx>
---
drivers/scsi/device_handler/scsi_dh_alua.c | 2 +-
drivers/scsi/scsi_lib.c | 5 +++++
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 308bda2e9c00..a68222e324e9 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -1092,7 +1092,7 @@ static blk_status_t alua_prep_fn(struct scsi_device *sdev, struct request *req)
case SCSI_ACCESS_STATE_LBA:
return BLK_STS_OK;
case SCSI_ACCESS_STATE_TRANSITIONING:
- return BLK_STS_RESOURCE;
+ return BLK_STS_AGAIN;
default:
req->rq_flags |= RQF_QUIET;
return BLK_STS_IOERR;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f0ee11dc07e4..b628aa0d824c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1726,6 +1726,11 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
scsi_device_blocked(sdev))
ret = BLK_STS_DEV_RESOURCE;
break;
+ case BLK_STS_AGAIN:
+ scsi_req(req)->result = DID_BUS_BUSY << 16;
+ if (req->rq_flags & RQF_DONTPREP)
+ scsi_mq_uninit_cmd(cmd);
+ break;
default:
if (unlikely(!scsi_device_online(sdev)))
scsi_req(req)->result = DID_NO_CONNECT << 16;
Hi Hannes,
What will happen if all remote ports have the state "transitioning"?
Does the above code resubmit a request immediately in that case? Can
this cause spinning with 100% CPU usage if the ALUA device handler
notices the transitioning state before multipathd does?
Curiously this patch only improves the current situation :-)
With the current behaviour we will return
BLK_STS_DEV_RESOURCE/BLK_STS_RESOURCE, causing an _immediate_ retry on
the same queue.
This patch will cause the I/O to be _completed_, to be requeued via the
end_request() path.
So yes, we will requeue (that's kinda the point of 'transitioning' ...),
but with a lower frequency as originally. _And_ with a chance of
multipath intercepting the I/O, which didn't happen previously.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer