On Wed, May 17, 2023 at 07:34:38PM -0700, Bart Van Assche wrote: > On 5/17/23 18:16, Ming Lei wrote: > > On Wed, May 17, 2023 at 04:09:27PM -0700, Bart Van Assche wrote: > > > @@ -1767,7 +1767,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx, > > > break; > > > case BLK_STS_RESOURCE: > > > case BLK_STS_ZONE_RESOURCE: > > > - if (scsi_device_blocked(sdev)) > > > + if (scsi_device_blocked(sdev) || shost->host_self_blocked) > > > ret = BLK_STS_DEV_RESOURCE; > > > > What if scsi_unblock_requests() is just called after the above check and > > before returning to block layer core? Then this request is invisible to > > scsi_run_host_queues()<-scsi_unblock_requests(), and io hang happens. > > If returning BLK_STS_DEV_RESOURCE could cause an I/O hang, wouldn't that be > a bug in the block layer core? Isn't the block layer core expected to rerun > the queue after a delay if a block driver returns BLK_STS_DEV_RESOURCE? See > also blk_mq_dispatch_rq_list(). Please see comment for BLK_STS_DEV_RESOURCE: /* * BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if * device related resources are unavailable, but the driver can guarantee * that the queue will be rerun in the future once resources become * available again. This is typically the case for device specific * resources that are consumed for IO. If the driver fails allocating these * resources, we know that inflight (or pending) IO will free these * resource upon completion. Basically it requires driver to re-run queue. In reality, it can be full of race, maybe we can just remove BLK_STS_DEV_RESOURCE. Thanks, Ming