Re: Luminous - replace old target-weight tree from osdmap with mgr balancer

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Thu, 11 Jan 2018 21:59:00 +0100

Am 11.01.2018 um 21:37 schrieb Sage Weil:
> On Thu, 11 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> OK it wasn't the balancer.
>>
>> It happens after executing all the reweight und crush compat commands:
>>
>> An even on a much bigger cluster it's 6% again. Some rounding issue? I
>> migrated with a script so it's not a typo.
> 
> Maybe.. can you narrow down which command it is?  I'm guessing that one of 
> the 'ceph osd crush weight-set reweight-compat ...' commands does it, but 
> it would be nice to confirm whether it is a rounding issue or if something 
> is broken!

Yes i'll try.

At least the crushmap looks very weird after executing all commands.
You'll find the decodec one here:
https://pastebin.com/raw/LjjAcww4

Stefan

> 
> sage
> 
>  > 
>> Stefan
>>
>> Am 11.01.2018 um 21:21 schrieb Stefan Priebe - Profihost AG:
>>> Hi,
>>>
>>> Am 11.01.2018 um 21:10 schrieb Sage Weil:
>>>> On Thu, 11 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>> Am 11.01.2018 um 20:58 schrieb Sage Weil:
>>>>>> On Thu, 11 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hi Sage,
>>>>>>>
>>>>>>> this did not work like expected. I tested it in another smaller cluster
>>>>>>> and it resulted in about 6% misplaced objects.
>>>>>>
>>>>>> Can you narrow down at what stage the misplaced objects happened?
>>>>>
>>>>> ouch i saw this:
>>>>> # ceph balancer status
>>>>> {
>>>>>     "active": true,
>>>>>     "plans": [
>>>>>         "auto_2018-01-11_19:52:28"
>>>>>     ],
>>>>>     "mode": "crush-compat"
>>>>> }
>>>>>
>>>>> so might it be the balancer beeing executed while i was modifying the tree?
>>>>> Can i stop it and reexecute it manually?
>>>>
>>>> You can always 'ceph balancer off'.  And I probably wouldn't turn it on 
>>>> until after you've cleaned this up because it will balance with the 
>>>> current weights being the 'target' weights (when in your case they're not 
>>>> (yet)).
>>>>
>>>> To manually see what the balancer would do you can
>>>>
>>>>  ceph balancer optimize foo
>>>>  ceph balancer show foo
>>>>  ceph balancer eval foo   # (see numerical analysis)
>>>>
>>>> and if it looks good
>>>>
>>>>  ceph balancer execute foo
>>>>
>>>> to actually apply the changes.
>>>
>>> ok thanks but it seems there are still leftovers somewhere:
>>> [expo-office-node1 ~]# ceph balancer optimize stefan
>>> Error EINVAL: Traceback (most recent call last):
>>>   File "/usr/lib/ceph/mgr/balancer/module.py", line 303, in handle_command
>>>     self.optimize(plan)
>>>   File "/usr/lib/ceph/mgr/balancer/module.py", line 596, in optimize
>>>     return self.do_crush_compat(plan)
>>>   File "/usr/lib/ceph/mgr/balancer/module.py", line 658, in do_crush_compat
>>>     orig_ws = self.get_compat_weight_set_weights()
>>>   File "/usr/lib/ceph/mgr/balancer/module.py", line 837, in
>>> get_compat_weight_set_weights
>>>     raise RuntimeError('could not find bucket %s' % b['bucket_id'])
>>> RuntimeError: could not find bucket -6
>>>
>>> [expo-office-node1 ~]# ceph osd tree
>>> ID CLASS WEIGHT   TYPE NAME                  STATUS REWEIGHT PRI-AFF
>>> -1       14.55957 root default
>>> -4        3.63989     host expo-office-node1
>>>  8   ssd  0.90997         osd.8                  up  1.00000 1.00000
>>>  9   ssd  0.90997         osd.9                  up  1.00000 1.00000
>>> 10   ssd  0.90997         osd.10                 up  1.00000 1.00000
>>> 11   ssd  0.90997         osd.11                 up  1.00000 1.00000
>>> -2        3.63989     host expo-office-node2
>>>  0   ssd  0.90997         osd.0                  up  1.00000 1.00000
>>>  1   ssd  0.90997         osd.1                  up  1.00000 1.00000
>>>  2   ssd  0.90997         osd.2                  up  1.00000 1.00000
>>>  3   ssd  0.90997         osd.3                  up  1.00000 1.00000
>>> -3        3.63989     host expo-office-node3
>>>  4   ssd  0.90997         osd.4                  up  1.00000 1.00000
>>>  5   ssd  0.90997         osd.5                  up  1.00000 1.00000
>>>  6   ssd  0.90997         osd.6                  up  1.00000 1.00000
>>>  7   ssd  0.90997         osd.7                  up  1.00000 1.00000
>>> -5        3.63989     host expo-office-node4
>>> 12   ssd  0.90997         osd.12                 up  1.00000 1.00000
>>> 13   ssd  0.90997         osd.13                 up  1.00000 1.00000
>>> 14   ssd  0.90997         osd.14                 up  1.00000 1.00000
>>> 15   ssd  0.90997         osd.15                 up  1.00000 1.00000
>>>
>>>
>>> Stefan
>>>
>>>> sage
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> sage
>>>>>>
>>>>>>>
>>>>>>> Any ideas?
>>>>>>>
>>>>>>> Stefan
>>>>>>> Am 11.01.2018 um 08:09 schrieb Stefan Priebe - Profihost AG:HI
>>>>>>>> Thanks! Can this be done while still having jewel clients?
>>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>> Excuse my typo sent from my mobile phone.
>>>>>>>>
>>>>>>>> Am 10.01.2018 um 22:56 schrieb Sage Weil <sage@xxxxxxxxxxxx
>>>>>>>> <mailto:sage@xxxxxxxxxxxx>>:
>>>>>>>>
>>>>>>>>> On Wed, 10 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>> Am 10.01.2018 um 22:23 schrieb Sage Weil:
>>>>>>>>>>> On Wed, 10 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>> k,
>>>>>>>>>>>>
>>>>>>>>>>>> in the past we used the python crush optimize tool to reweight the osd
>>>>>>>>>>>> usage - it inserted a 2nd tree with $hostname-target-weight as
>>>>>>>>>>>> hostnames.
>>>>>>>>>>>
>>>>>>>>>>> Can you attach a 'ceph osd crush tree' (or partial output) so I can see
>>>>>>>>>>> what you mean?
>>>>>>>>>>
>>>>>>>>>> Sure - attached.
>>>>>>>>>
>>>>>>>>> Got it
>>>>>>>>>
>>>>>>>>>>>> Now the quesions are:
>>>>>>>>>>>> 1.) can we remove the tree? How?
>>>>>>>>>>>> 2.) Can we do this now or only after all clients are running Luminous?
>>>>>>>>>>>> 3.) is it enought to enable the mgr balancer module?
>>>>>>>>>
>>>>>>>>> First,
>>>>>>>>>
>>>>>>>>> ceph osd crush weight-set create-compat
>>>>>>>>>
>>>>>>>>> then for each osd,
>>>>>>>>> ceph osd crush weight-set reweight-compat <osd> <optimized-weight>
>>>>>>>>> ceph osd crush reweight <osd> <target-weight>
>>>>>>>>>
>>>>>>>>> That won't move any data but will keep your current optimized weights in
>>>>>>>>> the compat weight-set where they belong.
>>>>>>>>>
>>>>>>>>> Then you can remove the *-target-weight buckets.  For each osd,
>>>>>>>>>
>>>>>>>>> ceph osd crush rm <osd> <ancestor>-target-weight
>>>>>>>>>
>>>>>>>>> and then for each remaining bucket
>>>>>>>>>
>>>>>>>>> ceph osd crush rm <foo>-target-weight
>>>>>>>>>
>>>>>>>>> Finally, turn on the balancer (or test it to see what it it wants to do
>>>>>>>>> with the optimize command.)
>>>>>>>>>
>>>>>>>>> HTH!
>>>>>>>>> sage
>>>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html