On Thu, Mar 1, 2018 at 10:40 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > On Thu, Mar 1, 2018 at 10:38 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: >> On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG >> <s.priebe@xxxxxxxxxxxx> wrote: >>> >>> Am 01.03.2018 um 09:58 schrieb Dan van der Ster: >>>> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG >>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>> Hi, >>>>> >>>>> Am 01.03.2018 um 09:42 schrieb Dan van der Ster: >>>>>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG >>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>> Hi, >>>>>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster: >>>>>>>> Is the score improving? >>>>>>>> >>>>>>>> ceph balancer eval >>>>>>>> >>>>>>>> It should be decreasing over time as the variances drop toward zero. >>>>>>>> >>>>>>>> You mentioned a crush optimize code at the beginning... how did that >>>>>>>> leave your cluster? The mgr balancer assumes that the crush weight of >>>>>>>> each OSD is equal to its size in TB. >>>>>>>> Do you have any osd reweights? crush-compat will gradually adjust >>>>>>>> those back to 1.0. >>>>>>> >>>>>>> I reweighted them all back to their correct weight. >>>>>>> >>>>>>> Now the mgr balancer module says: >>>>>>> mgr[balancer] Failed to find further optimization, score 0.010646 >>>>>>> >>>>>>> But as you can see it's heavily imbalanced: >>>>>>> >>>>>>> >>>>>>> Example: >>>>>>> 49 ssd 0.84000 1.00000 864G 546G 317G 63.26 1.13 49 >>>>>>> >>>>>>> vs: >>>>>>> >>>>>>> 48 ssd 0.84000 1.00000 864G 397G 467G 45.96 0.82 49 >>>>>>> >>>>>>> 45% usage vs. 63% >>>>>> >>>>>> Ahh... but look, the num PGs are perfectly balanced, which implies >>>>>> that you have a relatively large number of empty PGs. >>>>>> >>>>>> But regardless, this is annoying and I expect lots of operators to get >>>>>> this result. (I've also observed that the num PGs is gets balanced >>>>>> perfectly at the expense of the other score metrics.) >>>>>> >>>>>> I was thinking of a patch around here [1] that lets operators add a >>>>>> score weight on pgs, objects, bytes so we can balance how we like. >>>>>> >>>>>> Spandan: you were the last to look at this function. Do you think it >>>>>> can be improved as I suggested? >>>>> >>>>> Yes the PGs are perfectly distributed - but i think most of the people >>>>> would like to have a dsitribution by bytes and not pgs. >>>>> >>>>> Is this possible? I mean in the code there is already a dict for pgs, >>>>> objects and bytes - but i don't know how to change the logic. Just >>>>> remove the pgs and objects from the dict? >>>> >>>> It's worth a try to remove the pgs and objects from this dict: >>>> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552 >>> >>> Do i have to change this 3 to 1 cause we have only one item in the dict? >>> I'm not sure where the 3 comes from. >>> pe.score /= 3 * len(roots) >>> >> >> I'm pretty sure that 3 is just for our 3 metrics. Indeed you can >> change that to 1. >> >> I'm trying this on our test cluster here too. The last few lines of >> output from `ceph balancer eval-verbose` will confirm that the score >> is based only on bytes. >> >> But I'm not sure this is going to work -- indeed the score here went >> from ~0.02 to 0.08, but the do_crush_compat doesn't manage to find a >> better score. > > Maybe this: > > https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L682 > > I'm trying with that = 'bytes' That seems to be working. I sent this PR as a start https://github.com/ceph/ceph/pull/20665 I'm not sure we need to mess with the score function, on second thought. -- dan -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html