Re: ceph mgr balancer bad distribution

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 1, 2018 at 10:38 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> On Thu, Mar 1, 2018 at 10:24 AM, Stefan Priebe - Profihost AG
> <s.priebe@xxxxxxxxxxxx> wrote:
>>
>> Am 01.03.2018 um 09:58 schrieb Dan van der Ster:
>>> On Thu, Mar 1, 2018 at 9:52 AM, Stefan Priebe - Profihost AG
>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>> Hi,
>>>>
>>>> Am 01.03.2018 um 09:42 schrieb Dan van der Ster:
>>>>> On Thu, Mar 1, 2018 at 9:31 AM, Stefan Priebe - Profihost AG
>>>>> <s.priebe@xxxxxxxxxxxx> wrote:
>>>>>> Hi,
>>>>>> Am 01.03.2018 um 09:03 schrieb Dan van der Ster:
>>>>>>> Is the score improving?
>>>>>>>
>>>>>>>     ceph balancer eval
>>>>>>>
>>>>>>> It should be decreasing over time as the variances drop toward zero.
>>>>>>>
>>>>>>> You mentioned a crush optimize code at the beginning... how did that
>>>>>>> leave your cluster? The mgr balancer assumes that the crush weight of
>>>>>>> each OSD is equal to its size in TB.
>>>>>>> Do you have any osd reweights? crush-compat will gradually adjust
>>>>>>> those back to 1.0.
>>>>>>
>>>>>> I reweighted them all back to their correct weight.
>>>>>>
>>>>>> Now the mgr balancer module says:
>>>>>> mgr[balancer] Failed to find further optimization, score 0.010646
>>>>>>
>>>>>> But as you can see it's heavily imbalanced:
>>>>>>
>>>>>>
>>>>>> Example:
>>>>>> 49   ssd 0.84000  1.00000   864G   546G   317G 63.26 1.13  49
>>>>>>
>>>>>> vs:
>>>>>>
>>>>>> 48   ssd 0.84000  1.00000   864G   397G   467G 45.96 0.82  49
>>>>>>
>>>>>> 45% usage vs. 63%
>>>>>
>>>>> Ahh... but look, the num PGs are perfectly balanced, which implies
>>>>> that you have a relatively large number of empty PGs.
>>>>>
>>>>> But regardless, this is annoying and I expect lots of operators to get
>>>>> this result. (I've also observed that the num PGs is gets balanced
>>>>> perfectly at the expense of the other score metrics.)
>>>>>
>>>>> I was thinking of a patch around here [1] that lets operators add a
>>>>> score weight on pgs, objects, bytes so we can balance how we like.
>>>>>
>>>>> Spandan: you were the last to look at this function. Do you think it
>>>>> can be improved as I suggested?
>>>>
>>>> Yes the PGs are perfectly distributed - but i think most of the people
>>>> would like to have a dsitribution by bytes and not pgs.
>>>>
>>>> Is this possible? I mean in the code there is already a dict for pgs,
>>>> objects and bytes - but i don't know how to change the logic. Just
>>>> remove the pgs and objects from the dict?
>>>
>>> It's worth a try to remove the pgs and objects from this dict:
>>> https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L552
>>
>> Do i have to change this 3 to 1 cause we have only one item in the dict?
>> I'm not sure where the 3 comes from.
>>         pe.score /= 3 * len(roots)
>>
>
> I'm pretty sure that 3 is just for our 3 metrics. Indeed you can
> change that to 1.
>
> I'm trying this on our test cluster here too. The last few lines of
> output from `ceph balancer eval-verbose` will confirm that the score
> is based only on bytes.
>
> But I'm not sure this is going to work -- indeed the score here went
> from ~0.02 to 0.08, but the do_crush_compat doesn't manage to find a
> better score.

Maybe this:

https://github.com/ceph/ceph/blob/luminous/src/pybind/mgr/balancer/module.py#L682

I'm trying with that = 'bytes'

-- dan
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux