Re: Luminous - replace old target-weight tree from osdmap with mgr balancer

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Fri, 12 Jan 2018 22:23:18 +0100

Am 12.01.2018 um 21:21 schrieb Sage Weil:
> On Fri, 12 Jan 2018, Stefan Priebe - Profihost AG wrote:
>> Am 11.01.2018 um 21:37 schrieb Sage Weil:
>>> On Thu, 11 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>> OK it wasn't the balancer.
>>>>
>>>> It happens after executing all the reweight und crush compat commands:
>>>>
>>>> An even on a much bigger cluster it's 6% again. Some rounding issue? I
>>>> migrated with a script so it's not a typo.
>>>
>>> Maybe.. can you narrow down which command it is?  I'm guessing that one of 
>>> the 'ceph osd crush weight-set reweight-compat ...' commands does it, but 
>>> it would be nice to confirm whether it is a rounding issue or if something 
>>> is broken!
>>
>> Hi Sage,
>>
>> it happens while executing:
>> ceph osd crush weight-set reweight-compat <osd> <optimized-weight>
>> ceph osd crush reweight <osd> <target-weight>
>> right after the first command (reweight-compat optimized-weight) for the
> 
> It does this with the optimized-weight is *exactly* the same as the 
> current (normal) weight?  If it matches it should be a no-op.  Can you do 
> a 'ceph osd crush tree' before and after the command so we can compare?  
> (In fact, I think that first step is pointless because when you create the 
> compat weight-set it's populated with the regular CRUSH weights, which in 
> your situation *are* the optimized weights.)
> 
> Actually, looking at this more closely, it looks like the normal 
> 'reweight' command sets the value in the compat weight-set too, so in 
> reality we want to reorder those commands (and do them quickly in 
> succession).  e.g.,
> 
> ceph osd crush reweight osd.1 <target-weight>
> ceph osd crush weight-set reweight-compat osd.1 <optimized-weight>
> 
> but, again, the before and after PG layout should match.  A 'ceph osd 
> crush tree' dump before, after, and between will help sort out what is 
> going on.

Here we go (attached) - but those dumps are in json instead of plain
text but i had them already dumped by my script. I hope this helps.

Thanks,
Stefan

>  Thanks!
> sage
> 
>> 1st OSD it gets:
>> 0.4%
>> after the second (reweight target-weight):
>> 0.7%
>>
>> for each OSD it gets more worse...
>>
>> Stefan
>>
>>> sage
>>>
>>>  > 
>>>> Stefan
>>>>
>>>> Am 11.01.2018 um 21:21 schrieb Stefan Priebe - Profihost AG:
>>>>> Hi,
>>>>>
>>>>> Am 11.01.2018 um 21:10 schrieb Sage Weil:
>>>>>> On Thu, 11 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>> Am 11.01.2018 um 20:58 schrieb Sage Weil:
>>>>>>>> On Thu, 11 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>> Hi Sage,
>>>>>>>>>
>>>>>>>>> this did not work like expected. I tested it in another smaller cluster
>>>>>>>>> and it resulted in about 6% misplaced objects.
>>>>>>>>
>>>>>>>> Can you narrow down at what stage the misplaced objects happened?
>>>>>>>
>>>>>>> ouch i saw this:
>>>>>>> # ceph balancer status
>>>>>>> {
>>>>>>>     "active": true,
>>>>>>>     "plans": [
>>>>>>>         "auto_2018-01-11_19:52:28"
>>>>>>>     ],
>>>>>>>     "mode": "crush-compat"
>>>>>>> }
>>>>>>>
>>>>>>> so might it be the balancer beeing executed while i was modifying the tree?
>>>>>>> Can i stop it and reexecute it manually?
>>>>>>
>>>>>> You can always 'ceph balancer off'.  And I probably wouldn't turn it on 
>>>>>> until after you've cleaned this up because it will balance with the 
>>>>>> current weights being the 'target' weights (when in your case they're not 
>>>>>> (yet)).
>>>>>>
>>>>>> To manually see what the balancer would do you can
>>>>>>
>>>>>>  ceph balancer optimize foo
>>>>>>  ceph balancer show foo
>>>>>>  ceph balancer eval foo   # (see numerical analysis)
>>>>>>
>>>>>> and if it looks good
>>>>>>
>>>>>>  ceph balancer execute foo
>>>>>>
>>>>>> to actually apply the changes.
>>>>>
>>>>> ok thanks but it seems there are still leftovers somewhere:
>>>>> [expo-office-node1 ~]# ceph balancer optimize stefan
>>>>> Error EINVAL: Traceback (most recent call last):
>>>>>   File "/usr/lib/ceph/mgr/balancer/module.py", line 303, in handle_command
>>>>>     self.optimize(plan)
>>>>>   File "/usr/lib/ceph/mgr/balancer/module.py", line 596, in optimize
>>>>>     return self.do_crush_compat(plan)
>>>>>   File "/usr/lib/ceph/mgr/balancer/module.py", line 658, in do_crush_compat
>>>>>     orig_ws = self.get_compat_weight_set_weights()
>>>>>   File "/usr/lib/ceph/mgr/balancer/module.py", line 837, in
>>>>> get_compat_weight_set_weights
>>>>>     raise RuntimeError('could not find bucket %s' % b['bucket_id'])
>>>>> RuntimeError: could not find bucket -6
>>>>>
>>>>> [expo-office-node1 ~]# ceph osd tree
>>>>> ID CLASS WEIGHT   TYPE NAME                  STATUS REWEIGHT PRI-AFF
>>>>> -1       14.55957 root default
>>>>> -4        3.63989     host expo-office-node1
>>>>>  8   ssd  0.90997         osd.8                  up  1.00000 1.00000
>>>>>  9   ssd  0.90997         osd.9                  up  1.00000 1.00000
>>>>> 10   ssd  0.90997         osd.10                 up  1.00000 1.00000
>>>>> 11   ssd  0.90997         osd.11                 up  1.00000 1.00000
>>>>> -2        3.63989     host expo-office-node2
>>>>>  0   ssd  0.90997         osd.0                  up  1.00000 1.00000
>>>>>  1   ssd  0.90997         osd.1                  up  1.00000 1.00000
>>>>>  2   ssd  0.90997         osd.2                  up  1.00000 1.00000
>>>>>  3   ssd  0.90997         osd.3                  up  1.00000 1.00000
>>>>> -3        3.63989     host expo-office-node3
>>>>>  4   ssd  0.90997         osd.4                  up  1.00000 1.00000
>>>>>  5   ssd  0.90997         osd.5                  up  1.00000 1.00000
>>>>>  6   ssd  0.90997         osd.6                  up  1.00000 1.00000
>>>>>  7   ssd  0.90997         osd.7                  up  1.00000 1.00000
>>>>> -5        3.63989     host expo-office-node4
>>>>> 12   ssd  0.90997         osd.12                 up  1.00000 1.00000
>>>>> 13   ssd  0.90997         osd.13                 up  1.00000 1.00000
>>>>> 14   ssd  0.90997         osd.14                 up  1.00000 1.00000
>>>>> 15   ssd  0.90997         osd.15                 up  1.00000 1.00000
>>>>>
>>>>>
>>>>> Stefan
>>>>>
>>>>>> sage
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> sage
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any ideas?
>>>>>>>>>
>>>>>>>>> Stefan
>>>>>>>>> Am 11.01.2018 um 08:09 schrieb Stefan Priebe - Profihost AG:HI
>>>>>>>>>> Thanks! Can this be done while still having jewel clients?
>>>>>>>>>>
>>>>>>>>>> Stefan
>>>>>>>>>>
>>>>>>>>>> Excuse my typo sent from my mobile phone.
>>>>>>>>>>
>>>>>>>>>> Am 10.01.2018 um 22:56 schrieb Sage Weil <sage@xxxxxxxxxxxx
>>>>>>>>>> <mailto:sage@xxxxxxxxxxxx>>:
>>>>>>>>>>
>>>>>>>>>>> On Wed, 10 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>> Am 10.01.2018 um 22:23 schrieb Sage Weil:
>>>>>>>>>>>>> On Wed, 10 Jan 2018, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>>>>>> k,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> in the past we used the python crush optimize tool to reweight the osd
>>>>>>>>>>>>>> usage - it inserted a 2nd tree with $hostname-target-weight as
>>>>>>>>>>>>>> hostnames.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you attach a 'ceph osd crush tree' (or partial output) so I can see
>>>>>>>>>>>>> what you mean?
>>>>>>>>>>>>
>>>>>>>>>>>> Sure - attached.
>>>>>>>>>>>
>>>>>>>>>>> Got it
>>>>>>>>>>>
>>>>>>>>>>>>>> Now the quesions are:
>>>>>>>>>>>>>> 1.) can we remove the tree? How?
>>>>>>>>>>>>>> 2.) Can we do this now or only after all clients are running Luminous?
>>>>>>>>>>>>>> 3.) is it enought to enable the mgr balancer module?
>>>>>>>>>>>
>>>>>>>>>>> First,
>>>>>>>>>>>
>>>>>>>>>>> ceph osd crush weight-set create-compat
>>>>>>>>>>>
>>>>>>>>>>> then for each osd,
>>>>>>>>>>> ceph osd crush weight-set reweight-compat <osd> <optimized-weight>
>>>>>>>>>>> ceph osd crush reweight <osd> <target-weight>
>>>>>>>>>>>
>>>>>>>>>>> That won't move any data but will keep your current optimized weights in
>>>>>>>>>>> the compat weight-set where they belong.
>>>>>>>>>>>
>>>>>>>>>>> Then you can remove the *-target-weight buckets.  For each osd,
>>>>>>>>>>>
>>>>>>>>>>> ceph osd crush rm <osd> <ancestor>-target-weight
>>>>>>>>>>>
>>>>>>>>>>> and then for each remaining bucket
>>>>>>>>>>>
>>>>>>>>>>> ceph osd crush rm <foo>-target-weight
>>>>>>>>>>>
>>>>>>>>>>> Finally, turn on the balancer (or test it to see what it it wants to do
>>>>>>>>>>> with the optimize command.)
>>>>>>>>>>>
>>>>>>>>>>> HTH!
>>>>>>>>>>> sage
>>>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
Attachment:
maps.tar.gz

Description: application/gzip