On Fri, 12 Jan 2018, Stefan Priebe - Profihost AG wrote: > Am 11.01.2018 um 21:37 schrieb Sage Weil: > > On Thu, 11 Jan 2018, Stefan Priebe - Profihost AG wrote: > >> OK it wasn't the balancer. > >> > >> It happens after executing all the reweight und crush compat commands: > >> > >> An even on a much bigger cluster it's 6% again. Some rounding issue? I > >> migrated with a script so it's not a typo. > > > > Maybe.. can you narrow down which command it is? I'm guessing that one of > > the 'ceph osd crush weight-set reweight-compat ...' commands does it, but > > it would be nice to confirm whether it is a rounding issue or if something > > is broken! > > Hi Sage, > > it happens while executing: > ceph osd crush weight-set reweight-compat <osd> <optimized-weight> > ceph osd crush reweight <osd> <target-weight> > right after the first command (reweight-compat optimized-weight) for the It does this with the optimized-weight is *exactly* the same as the current (normal) weight? If it matches it should be a no-op. Can you do a 'ceph osd crush tree' before and after the command so we can compare? (In fact, I think that first step is pointless because when you create the compat weight-set it's populated with the regular CRUSH weights, which in your situation *are* the optimized weights.) Actually, looking at this more closely, it looks like the normal 'reweight' command sets the value in the compat weight-set too, so in reality we want to reorder those commands (and do them quickly in succession). e.g., ceph osd crush reweight osd.1 <target-weight> ceph osd crush weight-set reweight-compat osd.1 <optimized-weight> but, again, the before and after PG layout should match. A 'ceph osd crush tree' dump before, after, and between will help sort out what is going on. Thanks! sage > 1st OSD it gets: > 0.4% > after the second (reweight target-weight): > 0.7% > > for each OSD it gets more worse... > > Stefan > > > sage > > > > > > >> Stefan > >> > >> Am 11.01.2018 um 21:21 schrieb Stefan Priebe - Profihost AG: > >>> Hi, > >>> > >>> Am 11.01.2018 um 21:10 schrieb Sage Weil: > >>>> On Thu, 11 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>> Am 11.01.2018 um 20:58 schrieb Sage Weil: > >>>>>> On Thu, 11 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>> Hi Sage, > >>>>>>> > >>>>>>> this did not work like expected. I tested it in another smaller cluster > >>>>>>> and it resulted in about 6% misplaced objects. > >>>>>> > >>>>>> Can you narrow down at what stage the misplaced objects happened? > >>>>> > >>>>> ouch i saw this: > >>>>> # ceph balancer status > >>>>> { > >>>>> "active": true, > >>>>> "plans": [ > >>>>> "auto_2018-01-11_19:52:28" > >>>>> ], > >>>>> "mode": "crush-compat" > >>>>> } > >>>>> > >>>>> so might it be the balancer beeing executed while i was modifying the tree? > >>>>> Can i stop it and reexecute it manually? > >>>> > >>>> You can always 'ceph balancer off'. And I probably wouldn't turn it on > >>>> until after you've cleaned this up because it will balance with the > >>>> current weights being the 'target' weights (when in your case they're not > >>>> (yet)). > >>>> > >>>> To manually see what the balancer would do you can > >>>> > >>>> ceph balancer optimize foo > >>>> ceph balancer show foo > >>>> ceph balancer eval foo # (see numerical analysis) > >>>> > >>>> and if it looks good > >>>> > >>>> ceph balancer execute foo > >>>> > >>>> to actually apply the changes. > >>> > >>> ok thanks but it seems there are still leftovers somewhere: > >>> [expo-office-node1 ~]# ceph balancer optimize stefan > >>> Error EINVAL: Traceback (most recent call last): > >>> File "/usr/lib/ceph/mgr/balancer/module.py", line 303, in handle_command > >>> self.optimize(plan) > >>> File "/usr/lib/ceph/mgr/balancer/module.py", line 596, in optimize > >>> return self.do_crush_compat(plan) > >>> File "/usr/lib/ceph/mgr/balancer/module.py", line 658, in do_crush_compat > >>> orig_ws = self.get_compat_weight_set_weights() > >>> File "/usr/lib/ceph/mgr/balancer/module.py", line 837, in > >>> get_compat_weight_set_weights > >>> raise RuntimeError('could not find bucket %s' % b['bucket_id']) > >>> RuntimeError: could not find bucket -6 > >>> > >>> [expo-office-node1 ~]# ceph osd tree > >>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > >>> -1 14.55957 root default > >>> -4 3.63989 host expo-office-node1 > >>> 8 ssd 0.90997 osd.8 up 1.00000 1.00000 > >>> 9 ssd 0.90997 osd.9 up 1.00000 1.00000 > >>> 10 ssd 0.90997 osd.10 up 1.00000 1.00000 > >>> 11 ssd 0.90997 osd.11 up 1.00000 1.00000 > >>> -2 3.63989 host expo-office-node2 > >>> 0 ssd 0.90997 osd.0 up 1.00000 1.00000 > >>> 1 ssd 0.90997 osd.1 up 1.00000 1.00000 > >>> 2 ssd 0.90997 osd.2 up 1.00000 1.00000 > >>> 3 ssd 0.90997 osd.3 up 1.00000 1.00000 > >>> -3 3.63989 host expo-office-node3 > >>> 4 ssd 0.90997 osd.4 up 1.00000 1.00000 > >>> 5 ssd 0.90997 osd.5 up 1.00000 1.00000 > >>> 6 ssd 0.90997 osd.6 up 1.00000 1.00000 > >>> 7 ssd 0.90997 osd.7 up 1.00000 1.00000 > >>> -5 3.63989 host expo-office-node4 > >>> 12 ssd 0.90997 osd.12 up 1.00000 1.00000 > >>> 13 ssd 0.90997 osd.13 up 1.00000 1.00000 > >>> 14 ssd 0.90997 osd.14 up 1.00000 1.00000 > >>> 15 ssd 0.90997 osd.15 up 1.00000 1.00000 > >>> > >>> > >>> Stefan > >>> > >>>> sage > >>>> > >>>> > >>>>> > >>>>>> > >>>>>> sage > >>>>>> > >>>>>>> > >>>>>>> Any ideas? > >>>>>>> > >>>>>>> Stefan > >>>>>>> Am 11.01.2018 um 08:09 schrieb Stefan Priebe - Profihost AG:HI > >>>>>>>> Thanks! Can this be done while still having jewel clients? > >>>>>>>> > >>>>>>>> Stefan > >>>>>>>> > >>>>>>>> Excuse my typo sent from my mobile phone. > >>>>>>>> > >>>>>>>> Am 10.01.2018 um 22:56 schrieb Sage Weil <sage@xxxxxxxxxxxx > >>>>>>>> <mailto:sage@xxxxxxxxxxxx>>: > >>>>>>>> > >>>>>>>>> On Wed, 10 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>> Am 10.01.2018 um 22:23 schrieb Sage Weil: > >>>>>>>>>>> On Wed, 10 Jan 2018, Stefan Priebe - Profihost AG wrote: > >>>>>>>>>>>> k, > >>>>>>>>>>>> > >>>>>>>>>>>> in the past we used the python crush optimize tool to reweight the osd > >>>>>>>>>>>> usage - it inserted a 2nd tree with $hostname-target-weight as > >>>>>>>>>>>> hostnames. > >>>>>>>>>>> > >>>>>>>>>>> Can you attach a 'ceph osd crush tree' (or partial output) so I can see > >>>>>>>>>>> what you mean? > >>>>>>>>>> > >>>>>>>>>> Sure - attached. > >>>>>>>>> > >>>>>>>>> Got it > >>>>>>>>> > >>>>>>>>>>>> Now the quesions are: > >>>>>>>>>>>> 1.) can we remove the tree? How? > >>>>>>>>>>>> 2.) Can we do this now or only after all clients are running Luminous? > >>>>>>>>>>>> 3.) is it enought to enable the mgr balancer module? > >>>>>>>>> > >>>>>>>>> First, > >>>>>>>>> > >>>>>>>>> ceph osd crush weight-set create-compat > >>>>>>>>> > >>>>>>>>> then for each osd, > >>>>>>>>> ceph osd crush weight-set reweight-compat <osd> <optimized-weight> > >>>>>>>>> ceph osd crush reweight <osd> <target-weight> > >>>>>>>>> > >>>>>>>>> That won't move any data but will keep your current optimized weights in > >>>>>>>>> the compat weight-set where they belong. > >>>>>>>>> > >>>>>>>>> Then you can remove the *-target-weight buckets. For each osd, > >>>>>>>>> > >>>>>>>>> ceph osd crush rm <osd> <ancestor>-target-weight > >>>>>>>>> > >>>>>>>>> and then for each remaining bucket > >>>>>>>>> > >>>>>>>>> ceph osd crush rm <foo>-target-weight > >>>>>>>>> > >>>>>>>>> Finally, turn on the balancer (or test it to see what it it wants to do > >>>>>>>>> with the optimize command.) > >>>>>>>>> > >>>>>>>>> HTH! > >>>>>>>>> sage > >>>>>>> > >>>>> -- > >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>>>> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > >