Re: ceph balancer: further optimizations?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 20.08.2018 um 22:38 schrieb Dan van der Ster:
> On Mon, Aug 20, 2018 at 10:19 PM Stefan Priebe - Profihost AG
> <s.priebe@xxxxxxxxxxxx> wrote:
>>
>>
>> Am 20.08.2018 um 21:52 schrieb Sage Weil:
>>> On Mon, 20 Aug 2018, Stefan Priebe - Profihost AG wrote:
>>>> Hello,
>>>>
>>>> since loic seems to have left ceph development and his wunderful crush
>>>> optimization tool isn'T working anymore i'm trying to get a good
>>>> distribution with the ceph balancer.
>>>>
>>>> Sadly it does not work as good as i want.
>>>>
>>>> # ceph osd df | sort -k8
>>>>
>>>> show 75 to 83% Usage which is 8% difference which is too much for me.
>>>> I'm optimization by bytes.
>>>>
>>>> # ceph balancer eval
>>>> current cluster score 0.005420 (lower is better)
>>>>
>>>> # ceph balancer eval $OPT_NAME
>>>> plan spriebe_2018-08-20_19:36 final score 0.005456 (lower is better)
>>>>
>>>> I'm unable to optimize further ;-( Is there any chance to optimize
>>>> further even in case of more rebelancing?
>>>
>>> The scoring that the balancer module is doing is currently a hybrid of pg
>>> count, bytes, and object count.  Picking a single metric might help a bit
>>> (as those 3 things are not always perfectly aligned).
>>
>> Hi,
>>
>> ok i found a bug in the balancer code which seems to be present in all
>> releases.
>>
>>  861                     best_ws = next_ws
>>  862                     best_ow = next_ow
>>
>>
>> should be:
>>
>>  861                     best_ws = copy.deepcopy(next_ws)
>>  862                     best_ow = copy.deepcopy(next_ow)
>>
>> otherwise it does not use the best but the last.
> 
> Interesting... does that change improve things?

It fixes the following (mgr debug output):
2018-08-20 22:33:46.078525 7f2fbc3b6700  0 mgr[balancer] Step result
score 0.001152 -> 0.001180, misplacing 0.000912
2018-08-20 22:33:46.078574 7f2fbc3b6700  0 mgr[balancer] Score got
worse, taking another step
2018-08-20 22:33:46.078770 7f2fbc3b6700  0 mgr[balancer] Balancing root
default (pools ['cephstor2']) by bytes
2018-08-20 22:33:46.156326 7f2fbc3b6700  0 mgr[balancer] Step result
score 0.001152 -> 0.001180, misplacing 0.000912
2018-08-20 22:33:46.156374 7f2fbc3b6700  0 mgr[balancer] Score got
worse, taking another step
2018-08-20 22:33:46.156581 7f2fbc3b6700  0 mgr[balancer] Balancing root
default (pools ['cephstor2']) by bytes
2018-08-20 22:33:46.233818 7f2fbc3b6700  0 mgr[balancer] Step result
score 0.001152 -> 0.001180, misplacing 0.000912
2018-08-20 22:33:46.233868 7f2fbc3b6700  0 mgr[balancer] Score got
worse, taking another step
2018-08-20 22:33:46.234043 7f2fbc3b6700  0 mgr[balancer] Balancing root
default (pools ['cephstor2']) by bytes
2018-08-20 22:33:46.313212 7f2fbc3b6700  0 mgr[balancer] Step result
score 0.001152 -> 0.001180, misplacing 0.000912
2018-08-20 22:33:46.313714 7f2fbc3b6700  0 mgr[balancer] Score got
worse, trying smaller step 0.000244
2018-08-20 22:33:46.313887 7f2fbc3b6700  0 mgr[balancer] Balancing root
default (pools ['cephstor2']) by bytes
2018-08-20 22:33:46.391586 7f2fbc3b6700  0 mgr[balancer] Step result
score 0.001152 -> 0.001152, misplacing 0.001141
2018-08-20 22:33:46.393374 7f2fbc3b6700  0 mgr[balancer] Balancing root
default (pools ['cephstor2']) by bytes
2018-08-20 22:33:46.473956 7f2fbc3b6700  0 mgr[balancer] Step result
score 0.001152 -> 0.001180, misplacing 0.000912
2018-08-20 22:33:46.474001 7f2fbc3b6700  0 mgr[balancer] Score got
worse, taking another step
2018-08-20 22:33:46.474046 7f2fbc3b6700  0 mgr[balancer] Success, score
0.001155 -> 0.001152

BUT:
# ceph balancer eval myplan
plan myplan final score 0.001180 (lower is better)

So the final plan does NOT contain the expected optimization. The
deepcopy fixes it.

After:
# ceph balancer eval myplan
plan myplan final score 0.001152 (lower is better)

> 
> Also, if most of your data is in one pool you can try ceph balancer
> eval <pool-name>

Already tried this doesn't help much.

Greets,
Stefan


> -- dan
> 
>>
>> I'm also using this one:
>> https://github.com/ceph/ceph/pull/20665/commits/c161a74ad6cf006cd9b33b40fd7705b67c170615
>>
>> to optimize by bytes only.
>>
>> Greets,
>> Stefan



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux