Re: [PATCH] blk-mq: Properly init bios from blk_mq_alloc_request_hctx()

John Garry <john.garry@xxxxxxxxxx> · Tue, 25 Oct 2022 10:08:10 +0100

On 25/10/2022 10:00, Ming Lei wrote:
My use case is in SCSI EH domain. For my HBA controller of interest, to
abort an erroneous IO we must send a controller proprietary abort
command on same HW queue as original command. So we would need to
allocate this abort request for a specific HW queue.
IMO, it is one bad hw/sw interface.

First such request has to be reserved, since all inflight IOs can be in error.

Right

Second error handling needs to provide forward-progress, and it is supposed
to not require external dependency, otherwise easy to cause deadlock, but
here request from specific HW queue just depends on this queue's cpumask.

Also if it has to be reserved, it can be done as one device/driver private
command, so why bother blk-mq for this special use case?

I have a series for reserved request support, which I will send later. 
Please have a look. And as I mentioned, I would prob not end up using 
blk_mq_alloc_request_hctx() anyway.

I mentioned before that if no hctx->cpumask is online then we don't need
to allocate a request. That is because if no hctx->cpumask is online,
this means that original erroneous IO must be completed due to nature of
how blk-mq cpu hotplug handler works, i.e. drained, and then we don't
actually need to abort it any longer, so ok to not get a request.
No, it is really not OK, if all cpus in hctx->cpumask are offline, you
can't allocate
request on the specified hw queue, then the erroneous IO can't be handled,
then cpu hotplug handler may hang for ever.

If the erroneous IO is still in-flight from blk-mq perspective, then how 
can hctx->cpumask still be offline? I thought that we guarantee that 
hctx->cpumask cannot go offline until drained.

Thanks,
John