Re: How to check work of mClock QoS?

"J. Eric Ivancich" <ivancich@xxxxxxxxxx> · Mon, 16 Apr 2018 15:45:55 -0400

On 04/12/2018 12:23 PM, aboutbus wrote:
> Hi, all
> 
> I've experimented for a while with mClock QoS and didn't get any
> visible impact of QoS on speed/bandwidth of operations of different
> types (client, recovery).
> I suppose It's enough to have single PC with single drive to check
> how it works. I ran my tests on PC with 500Gb SSD, 32Gb RAM, I7 CPU.
> 
> 1) Changes in ceph.conf:
> 
> # double replication in any case
> osd pool default size = 2
> osd pool default min size = 2
> 
> # mclock queue, put client and recovery operations in one queue
> osd op queue = mclock_client
> osd op queue cut off = high
> 
> # reserve iops and higher weight for recovery
> osd_op_queue_mclock_client_op_wgt = 1.0
> osd_op_queue_mclock_osd_subop_wgt = 1.0
> osd_op_queue_mclock_recov_res = 500.0
> osd_op_queue_mclock_recov_wgt = 9.0
> 
> # lower number of shards will increase the impact of the mClock queues
> osd_op_num_shards = 1
> osd_op_num_threads_per_shard = 1

...

> 7) Check and compare speed of client and recovery IO operations:
> watch -n1 ceph -s
> 
> During next 5-10 minutes speed of recovery and client ops changes
> dramatically. Sometimes recovery ops goes faster, sometimes slower
> than client ops. Sometimes recovery can stop at all. But in general
> client ops goes much faster than recovery ops despite the mclock
> setttings.
> 
> Why speed/bandwidth of operations don't divides according to mclock
> settings? Probably, my test is not correct?
> 
> How to check work of mClock QoS?
> 
> Any help is appreciated.

A couple of things could be responsible for the behavior you're
experiencing. First realize mclock is currently experimental and we're
actively working on it.

One potential issue is the throttle between the op queue and the
bluestore transaction queue. The longer operations stay in the op queue
the more impact the mclock algorithm can have. But we need to make sure
bluestore has enough work to do. We did some experiments in improving
the throttling between the two. But SK Telecom worked on their own
throttle, which they call the "outstanding i/o throttle", and theirs
seemed to have better performance. It was part of a much larger pr and
we're waiting for them to isolate just the throttle, so we can merge it.

Another potential issue is that operations that enter the op queue have
a notion of cost. In the current mclock integration that cost is passed
to the mclock queue as each op is enqueued. The problem is that the
existing mclock library does not handle cost particularly well and it
should not have been passed along. I have a pr for the upcoming mimic
release to zero out the cost as ops enter mclock.

But at the same time, I'm working on an improved cost model for dmclock.
I hope that will be ready soon.

I know that doesn't give you a definitive answer, but it does hint at
some places to look.

Eric
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html