Re: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ming,


Thinking of this issue further, so far, one doable solution is to
expose reply queues
as blk-mq hw queues, as done by the following patchset:

https://lore.kernel.org/linux-block/20180205152035.15016-1-ming.lei@xxxxxxxxxx/

I thought that this patchset had fundamental issues, in terms of working for
all types of hosts. FYI, I did the backport of latest hisi_sas_v3 to v4.15

Could you explain it a bit about the fundamental issues for all types of
host?


*As I understand*, splitting the tagset has issues with dual-mode HBAs - as in supporting NVMe and SCSI, as some HBAs do.

It is just for hosts with multiple reply queues, such as hisi_sas v3,
megaraid_sas, mpt3sas and hpsa.

with this patchset (as you may have noticed in my git send mistake), but we
have not got to test it yet.

On a related topic, we did test exposing reply queues as blk-mq hw queues
and generating the host-wide tag internally in the LLDD with sbitmap, and
unfortunately we were experiencing a significant performance hit, like 2300K
-> 1800K IOPs for 4K read.

We need to test this further. I don't understand why we get such a big hit.

The performance regression shouldn't have been introduced in theory, and it is
because blk_mq_queue_tag_busy_iter() iterates over the same duplicated tags multiple
times, which can be fixed easily.


We are testing further, and I will tentatively say that we're getting better results (than previously) after fixing something in the LLDD. TBC.



In which global host-wide tags are shared for all blk-mq hw queues.

Also we can remove all the reply_map stuff in drivers, then solve the problem of
draining in-flight requests during unplugging CPU in a generic approach.

So you're saying that removing this reply queue stuff can make the solution
to the problem more generic, but do you have an idea of the overall
solution?

1) convert reply queue into blk-mq hw queue first

2) then all drivers are in same position wrt. handling requests vs.
unplugging CPU (shutdown managed IRQ)

The current handling in blk_mq_hctx_notify_dead() is actually wrong,

Yeah, the comment reads that it's going away, but it's actually gone.

at that time, all CPUs on the hctx are dead, blk_mq_run_hw_queue()
still dispatches requests on driver's hw queue, and driver is invisible
to DEAD CPUs mapped to this hctx, and finally interrupt for these
requests on the hctx are lost.

Frankly speaking, the above 2nd problem is still hard to solve.

1) take_cpu_down() shutdown managed IRQ first, then run teardown callback
for states in [CPUHP_AP_ONLINE, CPUHP_AP_OFFLINE) on the to-be-offline
CPU

2) However, all runnable tasks are removed from the CPU in the teardown
callback for CPUHP_AP_SCHED_STARTING, which is run after managed IRQs
are shutdown. That said it is hard to avoid new request queued to
the hctx with all DEAD CPUs.

3) we don't support to freeze queue for specific hctx yet, or that way
may not be accepted because of extra cost in fast path

4) once request is allocated, it should be submitted to driver no matter
if CPU hotplug happens or not. Or free it and re-allocate new request
on proper sw/hw queue?



Last time, it was reported that the patchset causes performance regression,
which is actually caused by duplicated io accounting in
blk_mq_queue_tag_busy_iter(),
which should be fixed easily.

What do you think of this approach?

It would still be good to have a forward port of this patchset for testing,
if we're serious about it. Or at least this bug you mention fixed.

I plan to make this patchset workable on 5.2-rc for your test first.


ok, thanks. I assume that we're still open to not adding support for global tags in blk-mq, but rather the LLDD generating the unique tag with sbitmap.

Cheers,
John


Thanks,
Ming

.






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux