On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> wrote: > > Am 01.03.2018 um 09:58 schrieb Dan van der Ster: >> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG >> <s.priebe@xxxxxxxxxxxx> wrote: >>> Hi, >>> >>> Am 01.03.2018 um 09:42 schrieb Dan van der Ster: >>>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG >>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>> Hi, >>>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster: >>>>>> Is the score improving? >>>>>> >>>>>> ceph balancer eval >>>>>> >>>>>> It should be decreasing over time as the variances drop toward zero. >>>>>> >>>>>> You mentioned a crush optimize code at the beginning... how did that >>>>>> leave your cluster? The mgr balancer assumes that the crush weight of >>>>>> each OSD is equal to its size in TB. >>>>>> Do you have any osd reweights? crush-compat will gradually adjust >>>>>> those back to 1.0. >>>>> >>>>> I reweighted them all back to their correct weight. >>>>> >>>>> Now the mgr balancer module says: >>>>> mgr[balancer] Failed to find further optimization, score 0.010646 >>>>> >>>>> But as you can see it's heavily imbalanced: >>>>> >>>>> >>>>> Example: >>>>> 49 ssd 0.84000 1.00000 864G 546G 317G 63.26 1.13 49 >>>>> >>>>> vs: >>>>> >>>>> 48 ssd 0.84000 1.00000 864G 397G 467G 45.96 0.82 49 >>>>> >>>>> 45% usage vs. 63% >>>> >>>> Ahh... but look, the num PGs are perfectly balanced, which implies >>>> that you have a relatively large number of empty PGs. >>>> >>>> But regardless, this is annoying and I expect lots of operators to get >>>> this result. (I've also observed that the num PGs is gets balanced >>>> perfectly at the expense of the other score metrics.) >>>> >>>> I was thinking of a patch around here [1] that lets operators add a >>>> score weight on pgs, objects, bytes so we can balance how we like. >>>> >>>> Spandan: you were the last to look at this function. Do you think it >>>> can be improved as I suggested? >>> >>> Yes the PGs are perfectly distributed - but i think most of the people >>> would like to have a dsitribution by bytes and not pgs. >>> >>> Is this possible? I mean in the code there is already a dict for pgs, >>> objects and bytes - but i don't know how to change the logic. Just >>> remove the pgs and objects from the dict? >> >> It's worth a try to remove the pgs and objects from this dict: >> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552 > > Do i have to change this 3 to 1 cause we have only one item in the dict? > I'm not sure where the 3 comes from. > pe.score /= 3 * len(roots) > I'm pretty sure that 3 is just for our 3 metrics. Indeed you can change that to 1. I'm trying this on our test cluster here too. The last few lines of output from `ceph balancer eval-verbose` will confirm that the score is based only on bytes. But I'm not sure this is going to work -- indeed the score here went from ~0.02 to 0.08, but the do_crush_compat doesn't manage to find a better score. -- Dan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html