Hi Eric, We are trying to evaluate dmclock's effect on controlling the recovery traffic in order to reducing impact on client io. However, we are experiencing some problem and didn't get our expected results. we setup a small cluster, with several OSD machines. In our configurations, we set recovery limit = 0.001 or even smaller, and res=0.0, wgt=1.0. we set client res = 20k or even higher, limit=0.0, wgt=500. Then we killed osd while doing fio on client side and bring it back to trigger recovery. We saw fio iops still reduced a lot comparable to not using dmclock queue. We did some debugging and saw that when recovery is active, fio requests enqueued much less frequent than before. overall, seems dmclock's configuration on recovery part does not show any differences. Since the enqueue rate of fio requests are reduced, when dmclock try to dequeue a request, there's less chance to pull a fio request. Can you give some comments on this? Thanks, Sheng On Wed, Jun 28, 2017 at 11:33 AM, J. Eric Ivancich <ivancich@xxxxxxxxxx> wrote: > On 06/27/2017 05:21 PM, sheng qiu wrote: >> i am appreciated to your kind reply. >> >> In our test, we set the following in the ceph.conf: >> >> osd_op_queue = mclock_client >> osd_op_queue_cut_off = high >> osd_op_queue_mclock_client_op_lim = 100.0 >> osd_op_queue_mclock_client_op_res = 50.0 >> osd_op_num_shards = 1 >> osd_op_num_threads_per_shard = 1 >> >> >> in this setup, all io requests should go to one mclock_client queue >> and using the mclock scheduling (osd_op_queue_cut_off = high). >> we use fio for test, we set job=1, bs=4k, qd=1 or 16. >> >> we are expecting the visible iops by fio should < 100, while we see a >> much higher value. >> Did we understand your work correctly? or did we miss anything? > > Hi Sheng, > > I think you understand things well, but there is one additional detail > you may not have noticed yet. And that is what should be done when all > clients have reached their limit momentarily and the ObjectStore would > like another op to keep itself busy? We either a) refuse to provide it > with an op, or b) give it the op the op that's most appropriate by > weight. The ceph code currently is not designed to handle a) and it's > not even clear that we should starve the ObjectStore in that manner. So > we do b), and that means we can exceed the limit. > > dmclock's PullPriorityQueue constructors have a parameter > _allow_limit_break, which ceph sets to true. That is how we do b) above. > If you ever wanted to set that to false you'd need to make other changes > to the ObjectStore ceph code to handle cases with the op queue is not > empty but not ready/willing to return an op when one is requested. > > Eric -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html