Re: Balancer module not balancing perfectly

Steve Taylor <steve.taylor@xxxxxxxxxxxxxxxx> · Tue, 30 Oct 2018 19:39:53 +0000

I was having a difficult time getting debug logs from the active mgr,
but I finally got it. Apparently injecting debug_mgr doesn't work, even
when the change is reflected when you query the running config.
Modifying the config file and restarting the mgr got it to log for me.

Now that I have some debug logging, I think I may see the problem.

'ceph config-key dump'
...
    "mgr/balancer/active": "1",
    "mgr/balancer/max_misplaced": "1",
    "mgr/balancer/mode": "upmap",
    "mgr/balancer/upmap_max_deviation": "0.0001",
    "mgr/balancer/upmap_max_iterations": "1000"

Mgr log excerpt:
2018-10-30 13:25:52.523117 7f08b47ff700  4 mgr[balancer] Optimize plan
upmap-balance
2018-10-30 13:25:52.523135 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/mode
2018-10-30 13:25:52.523141 7f08b47ff700 10 ceph_config_get mode found:
upmap
2018-10-30 13:25:52.523144 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/max_misplaced
2018-10-30 13:25:52.523145 7f08b47ff700 10 ceph_config_get
max_misplaced found: 1
2018-10-30 13:25:52.523178 7f08b47ff700  4 mgr[balancer] Mode upmap,
max misplaced 1.000000
2018-10-30 13:25:52.523241 7f08b47ff700 20 mgr[balancer] unknown
0.000000 degraded 0.000000 inactive 0.000000 misplaced 
0
2018-10-30 13:25:52.523288 7f08b47ff700  4 mgr[balancer] do_upmap
2018-10-30 13:25:52.523296 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_iterations
2018-10-30 13:25:52.523298 7f08b47ff700  4 ceph_config_get
upmap_max_iterations not found 
2018-10-30 13:25:52.523301 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_deviation
2018-10-30 13:25:52.523305 7f08b47ff700  4 ceph_config_get
upmap_max_deviation not found 
2018-10-30 13:25:52.523339 7f08b47ff700  4 mgr[balancer] pools ['rbd-
data']
2018-10-30 13:25:52.523350 7f08b47ff700 10 osdmap_calc_pg_upmaps osdmap
0x7f08b1884280 inc 0x7f0898bda800 max_deviation 
0.01 max_iterations 10 pools 3
2018-10-30 13:25:52.579669 7f08bbffc700  4 mgr ms_dispatch active
mgrdigest v1
2018-10-30 13:25:52.579671 7f08bbffc700  4 mgr ms_dispatch mgrdigest v1
2018-10-30 13:25:52.579673 7f08bbffc700 10 mgr handle_mgr_digest 1364
2018-10-30 13:25:52.579674 7f08bbffc700 10 mgr handle_mgr_digest 501
2018-10-30 13:25:52.579677 7f08bbffc700 10 mgr notify_all notify_all:
notify_all mon_status
2018-10-30 13:25:52.579681 7f08bbffc700 10 mgr notify_all notify_all:
notify_all health
2018-10-30 13:25:52.579683 7f08bbffc700 10 mgr notify_all notify_all:
notify_all pg_summary
2018-10-30 13:25:52.579684 7f08bbffc700 10 mgr handle_mgr_digest done.
2018-10-30 13:25:52.603867 7f08b47ff700 10 osdmap_calc_pg_upmaps r = 0
2018-10-30 13:25:52.603982 7f08b47ff700  4 mgr[balancer] prepared 0/10
changes

The mgr claims that mgr/balancer/upmap_max_iterations and
mgr/balancer/upmap_max_deviation aren't found in the config even though
they have been set and appear in the config-key dump. It seems to be
picking up the other config options correctly. Am I doing something
wrong? I feel like I must have a typo or something, but I'm not seeing
it.

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 

If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited.

On Tue, 2018-10-30 at 10:11 -0600, Steve Taylor wrote:
> I had played with those settings some already, but I just tried again
> with max_deviation set to 0.0001 and max_iterations set to 1000. Same
> result. Thanks for the suggestion though.
> 
> On Tue, 2018-10-30 at 12:06 -0400, David Turner wrote:
> 
> From the balancer module's code for v 12.2.7 I noticed [1] these
> lines which reference [2] these 2 config options for upmap. You might
> try using more max iterations or a smaller max deviation to see if
> you can get a better balance in your cluster. I would try to start
> with [3] these commands/values and see if it improves your balance
> and/or allows you to generate a better map.
> 
> [1] 
> 
https://github.com/ceph/ceph/blob/v12.2.7/src/pybind/mgr/balancer/module.py#L671-L672
> [2] upmap_max_iterations (default 10)
> upmap_max_deviation (default .01)
> 
> [3] ceph config-key set mgr/balancer/upmap_max_iterations 50
> ceph config-key set mgr/balancer/upmap_max_deviation .005
> 
> On Tue, Oct 30, 2018 at 11:14 AM Steve Taylor <
> steve.taylor@xxxxxxxxxxxxxxxx> wrote:
> 
> I have a Luminous 12.2.7 cluster with 2 EC pools, both using k=8
> and
> m=2. Each pool lives on 20 dedicated OSD hosts with 18 OSDs each.
> Each
> pool has 2048 PGs and is distributed across its 360 OSDs with host
> failure domains. The OSDs are identical (4TB) and are weighted with
> default weights (3.73).
> 
> Initially, and not surprisingly, the PG distribution was all over
> the
> place with PG counts per OSD ranging from 40 to 83. I enabled the
> balancer module in upmap mode and let it work its magic, which
> reduced
> the range of the per-OSD PG counts to 56-61.
> 
> While 56-61 is obviously a whole lot better than 40-83, with upmap
> I
> expected the range to be 56-57. If I run 'ceph balancer optimize
> <plan>' again to attempt to create a new plan I get 'Error
> EALREADY:
> Unable to find further optimization,or distribution is already
> perfect.' I set the balancer's max_misplaced value to 1 in case
> that
> was preventing further optimization, but I still get the same
> error.
> 
> I'm sure I'm missing some config option or something that will
> allow it
> to do better, but thus far I haven't been able to find anything in
> the
> docs, mailing list archives, or balancer source code that helps.
> Any
> ideas?
> 
> 
> Steve Taylor | Senior Software Engineer | StorageCraft Technology
> Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 | 
> 
> If you are not the intended recipient of this message or received
> it erroneously, please notify the sender and delete it, together
> with any attachments, and be advised that any dissemination or
> copying of this message is prohibited.
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com