Re: Quincy: mClock config propagation does not work properly

Sridhar Seshasayee <sseshasa@xxxxxxxxxx> · Thu, 31 Mar 2022 21:49:12 +0530

Hi Luis,

I was able to reproduce this issue locally and this looks like a bug. I
have raised
a tracker to help track the fix for this:
https://tracker.ceph.com/issues/55153.

This issue here is that with the 'custom' profile enabled, a change to the
config
parameters is written to the configuration db as you have noted, but the
values
are not coming into effect on the OSD(s).

I will look into this further and come back with a fix.

Thank you for trying out mclock and for your feedback.

-Sridhar

On Wed, Mar 30, 2022 at 8:40 PM Sridhar Seshasayee <sseshasa@xxxxxxxxxx>
wrote:

> Hi Luis,
>
> As Neha mentioned, I am trying out your steps and investigating this
> further.
> I will get back to you in the next day or two. Thanks for your patience.
>
> -Sridhar
>
> On Thu, Mar 17, 2022 at 11:51 PM Neha Ojha <nojha@xxxxxxxxxx> wrote:
>
>> Hi Luis,
>>
>> Thanks for testing the Quincy rc and trying out the mClock settings!
>> Sridhar is looking into this issue and will provide his feedback as
>> soon as possible.
>>
>> Thanks,
>> Neha
>>
>> On Thu, Mar 3, 2022 at 5:05 AM Luis Domingues <luis.domingues@xxxxxxxxx>
>> wrote:
>> >
>> > Hi all,
>> >
>> > As we are doing some tests on our lab cluster, running Quincy 17.1.0,
>> we observed some strange behavior regarding the propagation of the mClock
>> parameters to the OSDs. Basically, when we change the profile is set on a
>> per-recorded one, and we change to custom, the change on the different
>> mClock parameters are not propagated.
>> >
>> > For more details, here is how we reproduce the issue on our lab:
>> >
>> > ********************** Step 1
>> >
>> > We start the OSDs, with this configuration set, using ceph config dump:
>> >
>> > ```
>> >
>> > osd advanced osd_mclock_profile custom
>> > osd advanced osd_mclock_scheduler_background_recovery_lim 512
>> > osd advanced osd_mclock_scheduler_background_recovery_res 128
>> > osd advanced osd_mclock_scheduler_background_recovery_wgt 3
>> > osd advanced osd_mclock_scheduler_client_lim 80
>> > osd advanced osd_mclock_scheduler_client_res 30
>> > osd advanced osd_mclock_scheduler_client_wgt 1 osd advanced
>> osd_op_queue mclock_scheduler *
>> > ```
>> >
>> > And we can observe that this is what the OSD is running, using ceph
>> daemon osd.X config show:
>> >
>> > ```
>> > "osd_mclock_profile": "custom",
>> > "osd_mclock_scheduler_anticipation_timeout": "0.000000",
>> > "osd_mclock_scheduler_background_best_effort_lim": "999999",
>> > "osd_mclock_scheduler_background_best_effort_res": "1",
>> > "osd_mclock_scheduler_background_best_effort_wgt": "1",
>> > "osd_mclock_scheduler_background_recovery_lim": "512",
>> > "osd_mclock_scheduler_background_recovery_res": "128",
>> > "osd_mclock_scheduler_background_recovery_wgt": "3",
>> > "osd_mclock_scheduler_client_lim": "80",
>> > "osd_mclock_scheduler_client_res": "30",
>> > "osd_mclock_scheduler_client_wgt": "1",
>> > "osd_mclock_skip_benchmark": "false",
>> > "osd_op_queue": "mclock_scheduler",
>> > ```
>> >
>> > At this point, is we change something, the change can be viewed on the
>> osd. Let's say we change the background recovery to 100:
>> >
>> > `ceph config set osd osd_mclock_scheduler_background_recovery_res 100`
>> >
>> > The change has been set properly on the OSDs:
>> >
>> > ```
>> > "osd_mclock_profile": "custom",
>> > "osd_mclock_scheduler_anticipation_timeout": "0.000000",
>> > "osd_mclock_scheduler_background_best_effort_lim": "999999",
>> > "osd_mclock_scheduler_background_best_effort_res": "1",
>> > "osd_mclock_scheduler_background_best_effort_wgt": "1",
>> > "osd_mclock_scheduler_background_recovery_lim": "512",
>> > "osd_mclock_scheduler_background_recovery_res": "100",
>> > "osd_mclock_scheduler_background_recovery_wgt": "3",
>> > "osd_mclock_scheduler_client_lim": "80",
>> > "osd_mclock_scheduler_client_res": "30",
>> > "osd_mclock_scheduler_client_wgt": "1",
>> > "osd_mclock_skip_benchmark": "false",
>> > "osd_op_queue": "mclock_scheduler",
>> > ```
>> >
>> > ********************** Step 2
>> >
>> > We change the profile to high_recovery_ops, and remove the old
>> configuration
>> >
>> > ```
>> > ceph config set osd osd_mclock_profile high_recovery_ops
>> > ceph config rm osd osd_mclock_scheduler_background_recovery_lim
>> > ceph config rm osd osd_mclock_scheduler_background_recovery_res
>> > ceph config rm osd osd_mclock_scheduler_background_recovery_wgt
>> > ceph config rm osd osd_mclock_scheduler_client_lim
>> > ceph config rm osd osd_mclock_scheduler_client_resceph config rm osd
>> osd_mclock_scheduler_client_wgt
>> > ```
>> >
>> > The config contains this now:
>> >
>> > ```
>> > osd advanced osd_mclock_profile high_recovery_ops
>> > osd advanced osd_op_queue mclock_scheduler *
>> > ```
>> >
>> > And we can see that the configuration was propagated to the OSDs:
>> >
>> > ```
>> > "osd_mclock_profile": "high_recovery_ops",
>> > "osd_mclock_scheduler_anticipation_timeout": "0.000000",
>> > "osd_mclock_scheduler_background_best_effort_lim": "999999",
>> > "osd_mclock_scheduler_background_best_effort_res": "1",
>> > "osd_mclock_scheduler_background_best_effort_wgt": "2",
>> > "osd_mclock_scheduler_background_recovery_lim": "343",
>> > "osd_mclock_scheduler_background_recovery_res": "103",
>> > "osd_mclock_scheduler_background_recovery_wgt": "2",
>> > "osd_mclock_scheduler_client_lim": "137",
>> > "osd_mclock_scheduler_client_res": "51",
>> > "osd_mclock_scheduler_client_wgt": "1",
>> > "osd_mclock_skip_benchmark": "false",
>> > "osd_op_queue": "mclock_scheduler",
>> >
>> > ```
>> >
>> > ********************** Step 3
>> >
>> > The issue comes now, when we try to go back to custom profile:
>> >
>> > ```
>> > ceph config set osd osd_mclock_profile custom
>> > ceph config set osd osd_mclock_scheduler_background_recovery_lim 512
>> > ceph config set osd osd_mclock_scheduler_background_recovery_res 128
>> > ceph config set osd osd_mclock_scheduler_background_recovery_wgt 3
>> > ceph config set osd osd_mclock_scheduler_client_lim 80
>> > ceph config set osd osd_mclock_scheduler_client_res 30ceph config set
>> osd osd_mclock_scheduler_client_wgt 1
>> >
>> > ```
>> >
>> > The ceph configuration looks good:
>> >
>> > ```
>> > osd advanced osd_mclock_profile custom
>> > osd advanced osd_mclock_scheduler_background_recovery_lim 512
>> > osd advanced osd_mclock_scheduler_background_recovery_res 128
>> > osd advanced osd_mclock_scheduler_background_recovery_wgt 3
>> > osd advanced osd_mclock_scheduler_client_lim 80
>> > osd advanced osd_mclock_scheduler_client_res 30
>> > osd advanced osd_mclock_scheduler_client_wgt 1 osd advanced
>> osd_op_queue mclock_scheduler *
>> > ```
>> >
>> > But the lim, res and wgt on the OSDs still have the old high_recovery
>> config, even if the profile is custom:
>> >
>> > ```
>> > "osd_mclock_profile": "custom",
>> > "osd_mclock_scheduler_anticipation_timeout": "0.000000",
>> > "osd_mclock_scheduler_background_best_effort_lim": "999999",
>> > "osd_mclock_scheduler_background_best_effort_res": "1",
>> > "osd_mclock_scheduler_background_best_effort_wgt": "2",
>> > "osd_mclock_scheduler_background_recovery_lim": "343",
>> > "osd_mclock_scheduler_background_recovery_res": "103",
>> > "osd_mclock_scheduler_background_recovery_wgt": "2",
>> > "osd_mclock_scheduler_client_lim": "137",
>> > "osd_mclock_scheduler_client_res": "51",
>> > "osd_mclock_scheduler_client_wgt": "1",
>> > "osd_mclock_skip_benchmark": "false",
>> > "osd_op_queue": "mclock_scheduler",
>> > ```
>> >
>> > At this point, we can try to change what we want, the only changes that
>> as an effect on the mClock parameters is if we change to a pre-configured
>> profile, such as balanced, high_recovery_ops, high_client_ops. Setting
>> custom again will leave the OSDs on their last pre-configured profile.
>> >
>> > The only way I found to be able to change this parameters, is to
>> restart the OSDs.
>> >
>> > This look to me like a bug, but tell me if this is expected.
>> >
>> > Regards,
>> > Luis Domingues
>> > Proton AG
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >
>>
>>
>
> --
>
> Sridhar Seshasayee
>
> Principal Software Engineer
>
> Red Hat <https://www.redhat.com>
> <https://www.redhat.com>
>

-- 

Sridhar Seshasayee

Principal Software Engineer

Red Hat <https://www.redhat.com>
<https://www.redhat.com>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx