Re: How best to integrate dmClock QoS library into ceph codebase

sheng qiu <herbert1984106@xxxxxxxxx> · Tue, 27 Jun 2017 14:21:01 -0700

Hi Eric,

i am appreciated to your kind reply.

In our test, we set the following in the ceph.conf:

osd_op_queue = mclock_client
osd_op_queue_cut_off = high
osd_op_queue_mclock_client_op_lim = 100.0
osd_op_queue_mclock_client_op_res = 50.0
osd_op_num_shards = 1
osd_op_num_threads_per_shard = 1

in this setup, all io requests should go to one mclock_client queue
and using the mclock scheduling (osd_op_queue_cut_off = high).
we use fio for test, we set job=1, bs=4k, qd=1 or 16.

we are expecting the visible iops by fio should < 100, while we see a
much higher value.
Did we understand your work correctly? or did we miss anything?

Thanks,
Sheng

On Wed, Jun 21, 2017 at 2:04 PM, J. Eric Ivancich <ivancich@xxxxxxxxxx> wrote:
> Hi Sheng,
>
> I'll interleave responses below.
>
> On 06/21/2017 01:38 PM, sheng qiu wrote:
>> hi Eric,
>>
>> we are pretty interested in your dmclock integration work with CEPH.
>> After reading your pull request, i am a little confusing.
>> May i ask if the setting in config such as
>> osd_op_queue_mclock_client_op_res functioning in your added dmclock's
>> queues and their enqueue and dequeue methods?
>
> Yes, that (and related) configuration option is used. You'll see it
> referenced in both src/osd/mClockOpClassQueue.cc and
> src/osd/mClockClientQueue.cc.
>
> Let me answer for mClockOpClassQueue, but the process is similar in
> mClockClientQueue.
>
> The configuration value is brought into an instance of
> mClockOpClassQueue::mclock_op_tags_t. The variables
> mClockOpClassQueue::mclock_op_tags holds a unique_ptr to a singleton of
> that type. And then when a new operation is enqueued, the function
> mClockOpClassQueue::op_class_client_info_f is called to determine its
> mclock parameters at which time the value is used.
>
>> the below enqueue function insert request into a map<priority,
>> subqueue>, i guess for mclock_opclass queue, you set high priority for
>> client op and lower for scrub, recovery, etc.
>> Within each subqueue of same priority, did you do FIFO?
>>
>> void enqueue_strict(K cl, unsigned priority, T item) override final {
>>     high_queue[priority].enqueue(cl, 0, item);
>> }
>
> Yes, higher priority operations use a strict queue and lower priority
> operations use mclock. That basic behavior was based on the two earlier
> op queue implementations (src/common/WeightedPriorityQueue.h and
> src/common/PrioritizedQueue.h). The priority value that's used as a
> cut-off is determined by the configuration option osd_op_queue_cut_off
> (which can be "low" or "high", which map to values CEPH_MSG_PRIO_LOW and
> CEPH_MSG_PRIO_HIGH (defined in src/include/msgr.h); see function
> OSD::get_io_prio_cut).
>
> And those operations that end up in the high queue are handled strictly
> -- higher priorities before lower priorities.
>
>> I am appreciated if you can provide some comments, especially if i
>> didn't understand correctly.
>
> I hope that's helpful. Please let me know if you have further questions.
>
> Eric
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html