Re: dm-multipath low performance with blk-mq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/30/2016 12:35 AM, Mike Snitzer wrote:
On Wed, Jan 27 2016 at 12:56pm -0500,
Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> wrote:



On 27/01/2016 19:48, Mike Snitzer wrote:
On Wed, Jan 27 2016 at  6:14am -0500,
Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> wrote:


I don't think this is going to help __multipath_map() without some
configuration changes.  Now that we're running on already merged
requests instead of bios, the m->repeat_count is almost always set to 1,
so we call the path_selector every time, which means that we'll always
need the write lock. Bumping up the number of IOs we send before calling
the path selector again will give this patch a change to do some good
here.

To do that you need to set:

	rr_min_io_rq <something_bigger_than_one>

in the defaults section of /etc/multipath.conf and then reload the
multipathd service.

The patch should hopefully help in multipath_busy() regardless of the
the rr_min_io_rq setting.

This patch, while generic, is meant to help the blk-mq case.  A blk-mq
request_queue doesn't have an elevator so the requests will not have
seen merging.

But yes, implied in the patch is the requirement to increase
m->repeat_count via multipathd's rr_min_io_rq (I'll backfill a proper
header once it is tested).

I'll test it once I get some spare time (hopefully soon...)

OK thanks.

BTW, I _cannot_ get null_blk to come even close to your reported 1500K+
IOPs on 2 "fast" systems I have access to.  Which arguments are you
loading the null_blk module with?

I've been using:
modprobe null_blk gb=4 bs=4096 nr_devices=1 queue_mode=2 submit_queues=12

$ for f in /sys/module/null_blk/parameters/*; do echo $f; cat $f; done
/sys/module/null_blk/parameters/bs
512
/sys/module/null_blk/parameters/completion_nsec
10000
/sys/module/null_blk/parameters/gb
250
/sys/module/null_blk/parameters/home_node
-1
/sys/module/null_blk/parameters/hw_queue_depth
64
/sys/module/null_blk/parameters/irqmode
1
/sys/module/null_blk/parameters/nr_devices
2
/sys/module/null_blk/parameters/queue_mode
2
/sys/module/null_blk/parameters/submit_queues
24
/sys/module/null_blk/parameters/use_lightnvm
N
/sys/module/null_blk/parameters/use_per_node_hctx
N

$ fio --group_reporting --rw=randread --bs=4k --numjobs=24
--iodepth=32 --runtime=99999999 --time_based --loops=1
--ioengine=libaio --direct=1 --invalidate=1 --randrepeat=1
--norandommap --exitall --name task_nullb0 --filename=/dev/nullb0
task_nullb0: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
ioengine=libaio, iodepth=32
...
fio-2.1.10
Starting 24 processes
Jobs: 24 (f=24): [rrrrrrrrrrrrrrrrrrrrrrrr] [0.0% done]
[7234MB/0KB/0KB /s] [1852K/0/0 iops] [eta 1157d:09h:46m:22s]

Your test above is prone to exhaust the dm-mpath blk-mq tags (128)
because 24 threads * 32 easily exceeds 128 (by a factor of 6).

I found that we were context switching (via bt_get's io_schedule)
waiting for tags to become available.

This is embarassing but, until Jens told me today, I was oblivious to
the fact that the number of blk-mq's tags per hw_queue was defined by
tag_set.queue_depth.

Previously request-based DM's blk-mq support had:
md->tag_set.queue_depth = BLKDEV_MAX_RQ; (again: 128)

Now I have a patch that allows tuning queue_depth via dm_mod module
parameter.  And I'll likely bump the default to 4096 or something (doing
so eliminated blocking in bt_get).

But eliminating the tags bottleneck only raised my read IOPs from ~600K
to ~800K (using 1 hw_queue for both null_blk and dm-mpath).

When I raise nr_hw_queues to 4 for null_blk (keeping dm-mq at 1) I see a
whole lot more context switching due to request-based DM's use of
ksoftirqd (and kworkers) for request completion.

So I'm moving on to optimizing the completion path.  But at least some
progress was made, more to come...

Would you mind sharing your patches?
We're currently doing tests with a high-performance FC setup
(16G FC with all-flash storage), and are still 20% short of the announced backend performance.

Just as a side note: we're currently getting 550k IOPs.
With unpatched dm-mpath.
So nearly on par with your null-blk setup. but with real hardware.
(Which in itself is pretty cool. You should get faster RAM :-)

Cheers,

Hannes
--
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux