Re: ceph mgr balancer bad distribution

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Thu, 1 Mar 2018 11:30:40 +0100

On Thu, Mar 1, 2018 at 10:40 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> On Thu, Mar 1, 2018 at 10:38 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>> On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG
>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>
>>> Am 01.03.2018 um 09:58 schrieb Dan van der Ster:
>>>> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG
>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>> Hi,
>>>>>
>>>>> Am 01.03.2018 um 09:42 schrieb Dan van der Ster:
>>>>>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG
>>>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>>> Hi,
>>>>>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster:
>>>>>>>> Is the score improving?
>>>>>>>>
>>>>>>>>     ceph balancer eval
>>>>>>>>
>>>>>>>> It should be decreasing over time as the variances drop toward zero.
>>>>>>>>
>>>>>>>> You mentioned a crush optimize code at the beginning... how did that
>>>>>>>> leave your cluster? The mgr balancer assumes that the crush weight of
>>>>>>>> each OSD is equal to its size in TB.
>>>>>>>> Do you have any osd reweights? crush-compat will gradually adjust
>>>>>>>> those back to 1.0.
>>>>>>>
>>>>>>> I reweighted them all back to their correct weight.
>>>>>>>
>>>>>>> Now the mgr balancer module says:
>>>>>>> mgr[balancer] Failed to find further optimization, score 0.010646
>>>>>>>
>>>>>>> But as you can see it's heavily imbalanced:
>>>>>>>
>>>>>>>
>>>>>>> Example:
>>>>>>> 49   ssd 0.84000  1.00000   864G   546G   317G 63.26 1.13  49
>>>>>>>
>>>>>>> vs:
>>>>>>>
>>>>>>> 48   ssd 0.84000  1.00000   864G   397G   467G 45.96 0.82  49
>>>>>>>
>>>>>>> 45% usage vs. 63%
>>>>>>
>>>>>> Ahh... but look, the num PGs are perfectly balanced, which implies
>>>>>> that you have a relatively large number of empty PGs.
>>>>>>
>>>>>> But regardless, this is annoying and I expect lots of operators to get
>>>>>> this result. (I've also observed that the num PGs is gets balanced
>>>>>> perfectly at the expense of the other score metrics.)
>>>>>>
>>>>>> I was thinking of a patch around here [1] that lets operators add a
>>>>>> score weight on pgs, objects, bytes so we can balance how we like.
>>>>>>
>>>>>> Spandan: you were the last to look at this function. Do you think it
>>>>>> can be improved as I suggested?
>>>>>
>>>>> Yes the PGs are perfectly distributed - but i think most of the people
>>>>> would like to have a dsitribution by bytes and not pgs.
>>>>>
>>>>> Is this possible? I mean in the code there is already a dict for pgs,
>>>>> objects and bytes - but i don't know how to change the logic. Just
>>>>> remove the pgs and objects from the dict?
>>>>
>>>> It's worth a try to remove the pgs and objects from this dict:
>>>> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552
>>>
>>> Do i have to change this 3 to 1 cause we have only one item in the dict?
>>> I'm not sure where the 3 comes from.
>>>         pe.score /= 3 * len(roots)
>>>
>>
>> I'm pretty sure that 3 is just for our 3 metrics. Indeed you can
>> change that to 1.
>>
>> I'm trying this on our test cluster here too. The last few lines of
>> output from `ceph balancer eval-verbose` will confirm that the score
>> is based only on bytes.
>>
>> But I'm not sure this is going to work -- indeed the score here went
>> from ~0.02 to 0.08, but the do_crush_compat doesn't manage to find a
>> better score.
>
> Maybe this:
>
> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L682
>
> I'm trying with that = 'bytes'

That seems to be working. I sent this PR as a start
https://github.com/ceph/ceph/pull/20665

I'm not sure we need to mess with the score function, on second thought.

-- dan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com