Re: [Lsf] dm-mpath request merging concerns [was: Re: It's time to put together the schedule]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/23/15, 7:50 AM, Mike Snitzer wrote:
On Mon, Feb 23 2015 at  2:18am -0500,
Hannes Reinecke <hare@xxxxxxx> wrote:

On 02/20/2015 02:29 AM, James Bottomley wrote:
In the absence of any strong requests, the Programme Committee has taken
a first stab at an agenda here:

https://docs.google.com/spreadsheet/pub?key=0ArurRVMVCSnkdEl4a0NrNTgtU2JrWDNtWGRDOWRhZnc

If there's anything you think should be discussed (or shouldn't be
discussed) speak now ...

Recently we've found a rather worrysome queueing degradation with
multipathing, which pointed to a deficiency in the scheduler itself:
SAP found that a device with 4 paths had less I/O throughput than a
system with 2 paths. When they've reduced the queue depth on the 4
path system they managed to increase the throughput somewhat, but
still less than they've had with two paths.

The block layer doesn't have any understanding of how many paths are
behind the top-level dm-mpath request_queue that is supposed to be doing
the merging.

So from a pure design level it is surprising that 2 vs 4 is impacting
the merging at all.  I think Jeff Moyer (cc'd) has dealt with SAP
performance recently too.

As it turns out, with 4 paths the system rarely did any I/O merging,
but rather fired off the 4k requests as fast as possible.
With two paths it was able to do some merging, leading to improved
performance.

I was under the impression that the merging algorithm in the block
layer would only unplug the queue once the request had been fully
formed, ie after merging has happened. But apparently that is not
the case here.

Just because you aren't seeing merging are you sure it has anything to
do with unpluging?  Would be nice to know more about the workload.


I think I remember this problem. In the original request based design we hit this issue and Kiyoshi or Jun'ichi did some changes for it.

I think it was related to the busy/dm_lld_busy code in dm.c and dm-mpath.c. The problem was that we do the merging in the dm level queue. The underlying paths do not merge bios. They just take the request sent to them. The change that was done to promote (I do not think we ever completely fixed the issue) was that normally the workload was heavy enough or the paths slow enough so the busy check would return true enough to cause the dm layers queue not dispatch requests so quickly. They then had time to sit in the dm queue and merge with other bios.

If the device/transport is fast or the workload is low, the multipath_busy never returns busy, then we can hit Hannes's issue. For 4 paths, we just might not be able to fill up the paths and hit the busy check. With only 2 paths, we might be slow enough or the workload is heavy enough to keep the paths busy and so we hit the busy check and do more merging.

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux