On Thu, Mar 1, 2018 at 10:38 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG > <s.priebe@xxxxxxxxxxxx> wrote: >> >> Am 01.03.2018 um 09:58 schrieb Dan van der Ster: >>> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG >>> <s.priebe@xxxxxxxxxxxx> wrote: >>>> Hi, >>>> >>>> Am 01.03.2018 um 09:42 schrieb Dan van der Ster: >>>>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG >>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>> Hi, >>>>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster: >>>>>>> Is the score improving? >>>>>>> >>>>>>> ceph balancer eval >>>>>>> >>>>>>> It should be decreasing over time as the variances drop toward zero. >>>>>>> >>>>>>> You mentioned a crush optimize code at the beginning... how did that >>>>>>> leave your cluster? The mgr balancer assumes that the crush weight of >>>>>>> each OSD is equal to its size in TB. >>>>>>> Do you have any osd reweights? crush-compat will gradually adjust >>>>>>> those back to 1.0. >>>>>> >>>>>> I reweighted them all back to their correct weight. >>>>>> >>>>>> Now the mgr balancer module says: >>>>>> mgr[balancer] Failed to find further optimization, score 0.010646 >>>>>> >>>>>> But as you can see it's heavily imbalanced: >>>>>> >>>>>> >>>>>> Example: >>>>>> 49 ssd 0.84000 1.00000 864G 546G 317G 63.26 1.13 49 >>>>>> >>>>>> vs: >>>>>> >>>>>> 48 ssd 0.84000 1.00000 864G 397G 467G 45.96 0.82 49 >>>>>> >>>>>> 45% usage vs. 63% >>>>> >>>>> Ahh... but look, the num PGs are perfectly balanced, which implies >>>>> that you have a relatively large number of empty PGs. >>>>> >>>>> But regardless, this is annoying and I expect lots of operators to get >>>>> this result. (I've also observed that the num PGs is gets balanced >>>>> perfectly at the expense of the other score metrics.) >>>>> >>>>> I was thinking of a patch around here [1] that lets operators add a >>>>> score weight on pgs, objects, bytes so we can balance how we like. >>>>> >>>>> Spandan: you were the last to look at this function. Do you think it >>>>> can be improved as I suggested? >>>> >>>> Yes the PGs are perfectly distributed - but i think most of the people >>>> would like to have a dsitribution by bytes and not pgs. >>>> >>>> Is this possible? I mean in the code there is already a dict for pgs, >>>> objects and bytes - but i don't know how to change the logic. Just >>>> remove the pgs and objects from the dict? >>> >>> It's worth a try to remove the pgs and objects from this dict: >>> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552 >> >> Do i have to change this 3 to 1 cause we have only one item in the dict? >> I'm not sure where the 3 comes from. >> pe.score /= 3 * len(roots) >> > > I'm pretty sure that 3 is just for our 3 metrics. Indeed you can > change that to 1. > > I'm trying this on our test cluster here too. The last few lines of > output from `ceph balancer eval-verbose` will confirm that the score > is based only on bytes. > > But I'm not sure this is going to work -- indeed the score here went > from ~0.02 to 0.08, but the do_crush_compat doesn't manage to find a > better score. Maybe this: https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L682 I'm trying with that = 'bytes' -- dan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html