On 06/27/2017 05:21 PM, sheng qiu wrote: > i am appreciated to your kind reply. > > In our test, we set the following in the ceph.conf: > > osd_op_queue = mclock_client > osd_op_queue_cut_off = high > osd_op_queue_mclock_client_op_lim = 100.0 > osd_op_queue_mclock_client_op_res = 50.0 > osd_op_num_shards = 1 > osd_op_num_threads_per_shard = 1 > > > in this setup, all io requests should go to one mclock_client queue > and using the mclock scheduling (osd_op_queue_cut_off = high). > we use fio for test, we set job=1, bs=4k, qd=1 or 16. > > we are expecting the visible iops by fio should < 100, while we see a > much higher value. > Did we understand your work correctly? or did we miss anything? Hi Sheng, I think you understand things well, but there is one additional detail you may not have noticed yet. And that is what should be done when all clients have reached their limit momentarily and the ObjectStore would like another op to keep itself busy? We either a) refuse to provide it with an op, or b) give it the op the op that's most appropriate by weight. The ceph code currently is not designed to handle a) and it's not even clear that we should starve the ObjectStore in that manner. So we do b), and that means we can exceed the limit. dmclock's PullPriorityQueue constructors have a parameter _allow_limit_break, which ceph sets to true. That is how we do b) above. If you ever wanted to set that to false you'd need to make other changes to the ObjectStore ceph code to handle cases with the op queue is not empty but not ready/willing to return an op when one is requested. Eric -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html