Re: [PATCH v3 4/4] scsi: core: Delay running the queue if the host is blocked

Ming Lei <ming.lei@xxxxxxxxxx> · Thu, 18 May 2023 10:39:59 +0800

On Wed, May 17, 2023 at 07:34:38PM -0700, Bart Van Assche wrote:
> On 5/17/23 18:16, Ming Lei wrote:
> > On Wed, May 17, 2023 at 04:09:27PM -0700, Bart Van Assche wrote:
> > > @@ -1767,7 +1767,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
> > >   		break;
> > >   	case BLK_STS_RESOURCE:
> > >   	case BLK_STS_ZONE_RESOURCE:
> > > -		if (scsi_device_blocked(sdev))
> > > +		if (scsi_device_blocked(sdev) || shost->host_self_blocked)
> > >   			ret = BLK_STS_DEV_RESOURCE;
> > 
> > What if scsi_unblock_requests() is just called after the above check and
> > before returning to block layer core? Then this request is invisible to
> > scsi_run_host_queues()<-scsi_unblock_requests(), and io hang happens.
> 
> If returning BLK_STS_DEV_RESOURCE could cause an I/O hang, wouldn't that be
> a bug in the block layer core? Isn't the block layer core expected to rerun
> the queue after a delay if a block driver returns BLK_STS_DEV_RESOURCE? See
> also blk_mq_dispatch_rq_list().

Please see comment for BLK_STS_DEV_RESOURCE:

/*
 * BLK_STS_DEV_RESOURCE is returned from the driver to the block layer if
 * device related resources are unavailable, but the driver can guarantee
 * that the queue will be rerun in the future once resources become
 * available again. This is typically the case for device specific
 * resources that are consumed for IO. If the driver fails allocating these
 * resources, we know that inflight (or pending) IO will free these
 * resource upon completion.

Basically it requires driver to re-run queue.

In reality, it can be full of race, maybe we can just remove
BLK_STS_DEV_RESOURCE.

Thanks,
Ming