Re: Luminous - replace old target-weight tree from osdmap with mgr balancer

Sage Weil <sage@xxxxxxxxxxxx> · Thu, 11 Jan 2018 20:37:45 +0000 (UTC)

On Thu, 11 Jan 2018, Stefan Priebe - Profihost AG wrote:
> OK it wasn't the balancer.
> 
> It happens after executing all the reweight und crush compat commands:
> 
> An even on a much bigger cluster it's 6% again. Some rounding issue? I
> migrated with a script so it's not a typo.

Maybe.. can you narrow down which command it is?  I'm guessing that one of 
the 'ceph osd crush weight-set reweight-compat ...' commands does it, but 
it would be nice to confirm whether it is a rounding issue or if something 
is broken!

sage

 > 
> Stefan
> 
> Am 11.01.2018 um 21:21 schrieb Stefan Priebe - Profihost AG:
> > Hi,
> > 
> > Am 11.01.2018 um 21:10 schrieb Sage Weil:
> >> On Thu, 11 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>> Am 11.01.2018 um 20:58 schrieb Sage Weil:
> >>>> On Thu, 11 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>> Hi Sage,
> >>>>>
> >>>>> this did not work like expected. I tested it in another smaller cluster
> >>>>> and it resulted in about 6% misplaced objects.
> >>>>
> >>>> Can you narrow down at what stage the misplaced objects happened?
> >>>
> >>> ouch i saw this:
> >>> # ceph balancer status
> >>> {
> >>>     "active": true,
> >>>     "plans": [
> >>>         "auto_2018-01-11_19:52:28"
> >>>     ],
> >>>     "mode": "crush-compat"
> >>> }
> >>>
> >>> so might it be the balancer beeing executed while i was modifying the tree?
> >>> Can i stop it and reexecute it manually?
> >>
> >> You can always 'ceph balancer off'.  And I probably wouldn't turn it on 
> >> until after you've cleaned this up because it will balance with the 
> >> current weights being the 'target' weights (when in your case they're not 
> >> (yet)).
> >>
> >> To manually see what the balancer would do you can
> >>
> >>  ceph balancer optimize foo
> >>  ceph balancer show foo
> >>  ceph balancer eval foo   # (see numerical analysis)
> >>
> >> and if it looks good
> >>
> >>  ceph balancer execute foo
> >>
> >> to actually apply the changes.
> > 
> > ok thanks but it seems there are still leftovers somewhere:
> > [expo-office-node1 ~]# ceph balancer optimize stefan
> > Error EINVAL: Traceback (most recent call last):
> >   File "/usr/lib/ceph/mgr/balancer/module.py", line 303, in handle_command
> >     self.optimize(plan)
> >   File "/usr/lib/ceph/mgr/balancer/module.py", line 596, in optimize
> >     return self.do_crush_compat(plan)
> >   File "/usr/lib/ceph/mgr/balancer/module.py", line 658, in do_crush_compat
> >     orig_ws = self.get_compat_weight_set_weights()
> >   File "/usr/lib/ceph/mgr/balancer/module.py", line 837, in
> > get_compat_weight_set_weights
> >     raise RuntimeError('could not find bucket %s' % b['bucket_id'])
> > RuntimeError: could not find bucket -6
> > 
> > [expo-office-node1 ~]# ceph osd tree
> > ID CLASS WEIGHT   TYPE NAME                  STATUS REWEIGHT PRI-AFF
> > -1       14.55957 root default
> > -4        3.63989     host expo-office-node1
> >  8   ssd  0.90997         osd.8                  up  1.00000 1.00000
> >  9   ssd  0.90997         osd.9                  up  1.00000 1.00000
> > 10   ssd  0.90997         osd.10                 up  1.00000 1.00000
> > 11   ssd  0.90997         osd.11                 up  1.00000 1.00000
> > -2        3.63989     host expo-office-node2
> >  0   ssd  0.90997         osd.0                  up  1.00000 1.00000
> >  1   ssd  0.90997         osd.1                  up  1.00000 1.00000
> >  2   ssd  0.90997         osd.2                  up  1.00000 1.00000
> >  3   ssd  0.90997         osd.3                  up  1.00000 1.00000
> > -3        3.63989     host expo-office-node3
> >  4   ssd  0.90997         osd.4                  up  1.00000 1.00000
> >  5   ssd  0.90997         osd.5                  up  1.00000 1.00000
> >  6   ssd  0.90997         osd.6                  up  1.00000 1.00000
> >  7   ssd  0.90997         osd.7                  up  1.00000 1.00000
> > -5        3.63989     host expo-office-node4
> > 12   ssd  0.90997         osd.12                 up  1.00000 1.00000
> > 13   ssd  0.90997         osd.13                 up  1.00000 1.00000
> > 14   ssd  0.90997         osd.14                 up  1.00000 1.00000
> > 15   ssd  0.90997         osd.15                 up  1.00000 1.00000
> > 
> > 
> > Stefan
> > 
> >> sage
> >>
> >>
> >>>
> >>>>
> >>>> sage
> >>>>
> >>>>>
> >>>>> Any ideas?
> >>>>>
> >>>>> Stefan
> >>>>> Am 11.01.2018 um 08:09 schrieb Stefan Priebe - Profihost AG:HI
> >>>>>> Thanks! Can this be done while still having jewel clients?
> >>>>>>
> >>>>>> Stefan
> >>>>>>
> >>>>>> Excuse my typo sent from my mobile phone.
> >>>>>>
> >>>>>> Am 10.01.2018 um 22:56 schrieb Sage Weil <sage@xxxxxxxxxxxx
> >>>>>> <mailto:sage@xxxxxxxxxxxx>>:
> >>>>>>
> >>>>>>> On Wed, 10 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>> Am 10.01.2018 um 22:23 schrieb Sage Weil:
> >>>>>>>>> On Wed, 10 Jan 2018, Stefan Priebe - Profihost AG wrote:
> >>>>>>>>>> k,
> >>>>>>>>>>
> >>>>>>>>>> in the past we used the python crush optimize tool to reweight the osd
> >>>>>>>>>> usage - it inserted a 2nd tree with $hostname-target-weight as
> >>>>>>>>>> hostnames.
> >>>>>>>>>
> >>>>>>>>> Can you attach a 'ceph osd crush tree' (or partial output) so I can see
> >>>>>>>>> what you mean?
> >>>>>>>>
> >>>>>>>> Sure - attached.
> >>>>>>>
> >>>>>>> Got it
> >>>>>>>
> >>>>>>>>>> Now the quesions are:
> >>>>>>>>>> 1.) can we remove the tree? How?
> >>>>>>>>>> 2.) Can we do this now or only after all clients are running Luminous?
> >>>>>>>>>> 3.) is it enought to enable the mgr balancer module?
> >>>>>>>
> >>>>>>> First,
> >>>>>>>
> >>>>>>> ceph osd crush weight-set create-compat
> >>>>>>>
> >>>>>>> then for each osd,
> >>>>>>> ceph osd crush weight-set reweight-compat <osd> <optimized-weight>
> >>>>>>> ceph osd crush reweight <osd> <target-weight>
> >>>>>>>
> >>>>>>> That won't move any data but will keep your current optimized weights in
> >>>>>>> the compat weight-set where they belong.
> >>>>>>>
> >>>>>>> Then you can remove the *-target-weight buckets.  For each osd,
> >>>>>>>
> >>>>>>> ceph osd crush rm <osd> <ancestor>-target-weight
> >>>>>>>
> >>>>>>> and then for each remaining bucket
> >>>>>>>
> >>>>>>> ceph osd crush rm <foo>-target-weight
> >>>>>>>
> >>>>>>> Finally, turn on the balancer (or test it to see what it it wants to do
> >>>>>>> with the optimize command.)
> >>>>>>>
> >>>>>>> HTH!
> >>>>>>> sage
> >>>>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >>> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>