Re: [PATCH 0/2] blk-mq: fix blk_mq_alloc_request_hctx

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/30/21 10:42 AM, Ming Lei wrote:
On Wed, Jun 30, 2021 at 10:18:37AM +0200, Hannes Reinecke wrote:
On 6/29/21 9:49 AM, Ming Lei wrote:
Hi,

blk_mq_alloc_request_hctx() is used by NVMe fc/rdma/tcp/loop to connect
io queue. Also the sw ctx is chosen as the 1st online cpu in hctx->cpumask.
However, all cpus in hctx->cpumask may be offline.

This usage model isn't well supported by blk-mq which supposes allocator is
always done on one online CPU in hctx->cpumask. This assumption is
related with managed irq, which also requires blk-mq to drain inflight
request in this hctx when the last cpu in hctx->cpumask is going to
offline.

However, NVMe fc/rdma/tcp/loop don't use managed irq, so we should allow
them to ask for request allocation when the specified hctx is inactive
(all cpus in hctx->cpumask are offline).

Fix blk_mq_alloc_request_hctx() by adding/passing flag of
BLK_MQ_F_NOT_USE_MANAGED_IRQ.


Ming Lei (2):
    blk-mq: not deactivate hctx if the device doesn't use managed irq
    nvme: pass BLK_MQ_F_NOT_USE_MANAGED_IRQ for fc/rdma/tcp/loop

   block/blk-mq.c             | 6 +++++-
   drivers/nvme/host/fc.c     | 3 ++-
   drivers/nvme/host/rdma.c   | 3 ++-
   drivers/nvme/host/tcp.c    | 3 ++-
   drivers/nvme/target/loop.c | 3 ++-
   include/linux/blk-mq.h     | 1 +
   6 files changed, 14 insertions(+), 5 deletions(-)

Cc: Sagi Grimberg <sagi@xxxxxxxxxxx>
Cc: Daniel Wagner <dwagner@suse. thede>
Cc: Wen Xiong <wenxiong@xxxxxxxxxx>
Cc: John Garry <john.garry@xxxxxxxxxx>


I have my misgivings about this patchset.
To my understanding, only CPUs present in the hctx cpumask are eligible to
submit I/O to that hctx.

It is just true for managed irq, and should be CPUs online.

However, no such constraint for non managed irq, since irq may migrate to
other online CPUs if all CPUs in irq's current affinity become offline.


But there shouldn't be any I/O pending during CPU offline (cf blk_mq_hctx_notify_offline()), so no interrupts should be triggered, either, no?

Consequently if all cpus in that mask are offline, where is the point of
even transmitting a 'connect' request?

nvmef requires to submit the connect request via one specified hctx
which index has to be same with the io queue's index.

Almost all nvmef drivers fail to setup controller in case of
connect io queue error.


And I would prefer to fix that, namely allowing blk-mq to run on a sparse set of io queues. The remaining io queues can be connected once the first cpu in the hctx cpumask is onlined; we already have blk_mq_hctx_notify_online(), which could easily be expanded to connect to relevant I/O queue...

Also CPU can become offline & online, especially it is done in
lots of sanity test.


True, but then again all I/O on the hctx should be quiesced during cpu offline.

So we should allow to allocate the connect request successful, and
submit it to drivers given it is allowed in this way for non-managed
irq.


I'd rather not do this, as the 'connect' command runs on the 'normal' I/O tagset, and hence runs into the risk of being issues against non-existing CPUs.

Shouldn't we rather modify the tagset to only refer to the current online
CPUs _only_, thereby never submit a connect request for hctx with only
offline CPUs?

Then you may setup very less io queues, and performance may suffer even
though lots of CPUs become online later.
;
Only if we stay with the reduced number of I/O queues. Which is not what I'm proposing; I'd rather prefer to connect and disconnect queues from the cpu hotplug handler. For starters we could even trigger a reset once the first cpu within a hctx is onlined.

Cheers,

Hannes
--
Dr. Hannes Reinecke                Kernel Storage Architect
hare@xxxxxxx                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux