在 2020/12/3 9:03, Ming Lei 写道:
On Wed, Dec 02, 2020 at 10:10:30AM -0800, Bart Van Assche wrote:
On 12/2/20 2:04 AM, Ming Lei wrote:
When queuing IO request to LLD, STS_RESOURCE may be returned because:
- host in recovery or blocked
- target queue throttling or blocked
- LLD rejection
Any one of the above doesn't happen frequently enough.
BLK_STS_DEV_RESOURCE is returned to block layer for avoiding unnecessary
re-run queue, and it is just one small optimization. However, all
in-flight requests originated from this scsi device may be completed
just after reading 'sdev->device_busy', so BLK_STS_DEV_RESOURCE is
returned to block layer. And the current failed IO won't get chance
to be queued any more, since it is invisible at that time for either
scsi_run_queue_async() or blk-mq's RESTART.
Fix the issue by not returning BLK_STS_DEV_RESOURCE in this situation.
Cc: Hannes Reinecke <hare@xxxxxxxx>
Cc: Sumit Saxena <sumit.saxena@xxxxxxxxxxxx>
Cc: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx>
Cc: Bart Van Assche <bvanassche@xxxxxxx>
Cc: Ewan Milne <emilne@xxxxxxxxxx>
Cc: Long Li <longli@xxxxxxxxxxxxx>
Tested-by: "chenxiang (M)" <chenxiang66@xxxxxxxxxxxxx>
Reported-by: John Garry <john.garry@xxxxxxxxxx>
Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
---
drivers/scsi/scsi_lib.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 60c7a7d74852..03c6d0620bfd 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1703,8 +1703,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
break;
case BLK_STS_RESOURCE:
case BLK_STS_ZONE_RESOURCE:
- if (atomic_read(&sdev->device_busy) ||
- scsi_device_blocked(sdev))
+ if (scsi_device_blocked(sdev))
ret = BLK_STS_DEV_RESOURCE;
break;
default:
Since this patch modifies code introduced in commit 86ff7c2a80cd ("blk-mq:
introduce BLK_STS_DEV_RESOURCE"), does this patch perhaps needs a Fixes:
tag?
This same race exists before commit 86ff7c2a80cd, so I think the 'Fixes:' tag
is misleading.
When reverted the patch "scsi: core: Only re-run queue in
scsi_end_request() if device queue is busy", it also solves the issue.
Does the issue is brought by the patch? If so, maybe adding
fixes("Fixes: ed5dd6a67d5e ("scsi: core: Only re-run queue in
scsi_end_request() if device queue is busy")") is more accuratte.
Thanks,
Ming
.