nice thanks will try that soon. Can you tell me how to change the log lever to info for the balancer module? Am 01.03.2018 um 11:30 schrieb Dan van der Ster: > On Thu, Mar 1, 2018 at 10:40 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: >> On Thu, Mar 1, 2018 at 10:38 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: >>> On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG >>> <s.priebe@xxxxxxxxxxxx> wrote: >>>> >>>> Am 01.03.2018 um 09:58 schrieb Dan van der Ster: >>>>> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG >>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>> Hi, >>>>>> >>>>>> Am 01.03.2018 um 09:42 schrieb Dan van der Ster: >>>>>>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG >>>>>>> <s.priebe@xxxxxxxxxxxx> wrote: >>>>>>>> Hi, >>>>>>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster: >>>>>>>>> Is the score improving? >>>>>>>>> >>>>>>>>> ceph balancer eval >>>>>>>>> >>>>>>>>> It should be decreasing over time as the variances drop toward zero. >>>>>>>>> >>>>>>>>> You mentioned a crush optimize code at the beginning... how did that >>>>>>>>> leave your cluster? The mgr balancer assumes that the crush weight of >>>>>>>>> each OSD is equal to its size in TB. >>>>>>>>> Do you have any osd reweights? crush-compat will gradually adjust >>>>>>>>> those back to 1.0. >>>>>>>> >>>>>>>> I reweighted them all back to their correct weight. >>>>>>>> >>>>>>>> Now the mgr balancer module says: >>>>>>>> mgr[balancer] Failed to find further optimization, score 0.010646 >>>>>>>> >>>>>>>> But as you can see it's heavily imbalanced: >>>>>>>> >>>>>>>> >>>>>>>> Example: >>>>>>>> 49 ssd 0.84000 1.00000 864G 546G 317G 63.26 1.13 49 >>>>>>>> >>>>>>>> vs: >>>>>>>> >>>>>>>> 48 ssd 0.84000 1.00000 864G 397G 467G 45.96 0.82 49 >>>>>>>> >>>>>>>> 45% usage vs. 63% >>>>>>> >>>>>>> Ahh... but look, the num PGs are perfectly balanced, which implies >>>>>>> that you have a relatively large number of empty PGs. >>>>>>> >>>>>>> But regardless, this is annoying and I expect lots of operators to get >>>>>>> this result. (I've also observed that the num PGs is gets balanced >>>>>>> perfectly at the expense of the other score metrics.) >>>>>>> >>>>>>> I was thinking of a patch around here [1] that lets operators add a >>>>>>> score weight on pgs, objects, bytes so we can balance how we like. >>>>>>> >>>>>>> Spandan: you were the last to look at this function. Do you think it >>>>>>> can be improved as I suggested? >>>>>> >>>>>> Yes the PGs are perfectly distributed - but i think most of the people >>>>>> would like to have a dsitribution by bytes and not pgs. >>>>>> >>>>>> Is this possible? I mean in the code there is already a dict for pgs, >>>>>> objects and bytes - but i don't know how to change the logic. Just >>>>>> remove the pgs and objects from the dict? >>>>> >>>>> It's worth a try to remove the pgs and objects from this dict: >>>>> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552 >>>> >>>> Do i have to change this 3 to 1 cause we have only one item in the dict? >>>> I'm not sure where the 3 comes from. >>>> pe.score /= 3 * len(roots) >>>> >>> >>> I'm pretty sure that 3 is just for our 3 metrics. Indeed you can >>> change that to 1. >>> >>> I'm trying this on our test cluster here too. The last few lines of >>> output from `ceph balancer eval-verbose` will confirm that the score >>> is based only on bytes. >>> >>> But I'm not sure this is going to work -- indeed the score here went >>> from ~0.02 to 0.08, but the do_crush_compat doesn't manage to find a >>> better score. >> >> Maybe this: >> >> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L682 >> >> I'm trying with that = 'bytes' > > That seems to be working. I sent this PR as a start > https://github.com/ceph/ceph/pull/20665 > > I'm not sure we need to mess with the score function, on second thought. > > -- dan > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com