Re: Balancer module not balancing perfectly

Steve Taylor <steve.taylor@xxxxxxxxxxxxxxxx> · Wed, 31 Oct 2018 16:48:16 +0000

I think I pretty well have things figured out at this point, but I'm not sure how to proceed.

The config-key settings were not effective because I had not restarted the active mgr after setting them. Once I restarted the mgr the settings became effective.

Once I had the config-key settings working I quickly discovered that they didn't make any difference, so I downloaded an osdmap and started trying to use osdmaptool offline to see if it would behave differently. It didn't, but when I specified '--debug-osd 20' on the osdmaptool command line things got interesting.

It looks like osdmaptool generates lists of overfull and underfull OSDs and then uses those lists to move PGs in order to achieve a perfect balance. In my case the expected PG count range per OSD is 56-57, but the actual range is 56-61. The problem seems to lie in the fact that all of my OSDs have at least 56 PGs and are therefore not considered underfull. The debug output from osdmaptool shows a decent list of overfull OSDs and an empty list of underfull OSDs, then says there is nothing to be done.

Perhaps the next step is to modify osdmaptool to allow OSDs that are not underfull but will not be made overfull by the move to take new PGs? That seems like it should be the expected behavior in this scenario.

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 

If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited.

-----Original Message-----
From: Steve Taylor 
Sent: Tuesday, October 30, 2018 1:40 PM
To: drakonstein@xxxxxxxxx
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Balancer module not balancing perfectly

I was having a difficult time getting debug logs from the active mgr, but I finally got it. Apparently injecting debug_mgr doesn't work, even when the change is reflected when you query the running config.
Modifying the config file and restarting the mgr got it to log for me.

Now that I have some debug logging, I think I may see the problem.

'ceph config-key dump'
...
    "mgr/balancer/active": "1",
    "mgr/balancer/max_misplaced": "1",
    "mgr/balancer/mode": "upmap",
    "mgr/balancer/upmap_max_deviation": "0.0001",
    "mgr/balancer/upmap_max_iterations": "1000"

Mgr log excerpt:
2018-10-30 13:25:52.523117 7f08b47ff700  4 mgr[balancer] Optimize plan upmap-balance
2018-10-30 13:25:52.523135 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/mode
2018-10-30 13:25:52.523141 7f08b47ff700 10 ceph_config_get mode found:
upmap
2018-10-30 13:25:52.523144 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/max_misplaced
2018-10-30 13:25:52.523145 7f08b47ff700 10 ceph_config_get max_misplaced found: 1
2018-10-30 13:25:52.523178 7f08b47ff700  4 mgr[balancer] Mode upmap, max misplaced 1.000000
2018-10-30 13:25:52.523241 7f08b47ff700 20 mgr[balancer] unknown
0.000000 degraded 0.000000 inactive 0.000000 misplaced
0
2018-10-30 13:25:52.523288 7f08b47ff700  4 mgr[balancer] do_upmap
2018-10-30 13:25:52.523296 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_iterations
2018-10-30 13:25:52.523298 7f08b47ff700  4 ceph_config_get upmap_max_iterations not found
2018-10-30 13:25:52.523301 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_deviation
2018-10-30 13:25:52.523305 7f08b47ff700  4 ceph_config_get upmap_max_deviation not found
2018-10-30 13:25:52.523339 7f08b47ff700  4 mgr[balancer] pools ['rbd- data']
2018-10-30 13:25:52.523350 7f08b47ff700 10 osdmap_calc_pg_upmaps osdmap
0x7f08b1884280 inc 0x7f0898bda800 max_deviation
0.01 max_iterations 10 pools 3
2018-10-30 13:25:52.579669 7f08bbffc700  4 mgr ms_dispatch active mgrdigest v1
2018-10-30 13:25:52.579671 7f08bbffc700  4 mgr ms_dispatch mgrdigest v1
2018-10-30 13:25:52.579673 7f08bbffc700 10 mgr handle_mgr_digest 1364
2018-10-30 13:25:52.579674 7f08bbffc700 10 mgr handle_mgr_digest 501
2018-10-30 13:25:52.579677 7f08bbffc700 10 mgr notify_all notify_all:
notify_all mon_status
2018-10-30 13:25:52.579681 7f08bbffc700 10 mgr notify_all notify_all:
notify_all health
2018-10-30 13:25:52.579683 7f08bbffc700 10 mgr notify_all notify_all:
notify_all pg_summary
2018-10-30 13:25:52.579684 7f08bbffc700 10 mgr handle_mgr_digest done.
2018-10-30 13:25:52.603867 7f08b47ff700 10 osdmap_calc_pg_upmaps r = 0
2018-10-30 13:25:52.603982 7f08b47ff700  4 mgr[balancer] prepared 0/10 changes

The mgr claims that mgr/balancer/upmap_max_iterations and mgr/balancer/upmap_max_deviation aren't found in the config even though they have been set and appear in the config-key dump. It seems to be picking up the other config options correctly. Am I doing something wrong? I feel like I must have a typo or something, but I'm not seeing it.

On Tue, 2018-10-30 at 10:11 -0600, Steve Taylor wrote:
> I had played with those settings some already, but I just tried again 
> with max_deviation set to 0.0001 and max_iterations set to 1000. Same 
> result. Thanks for the suggestion though.
> 
> On Tue, 2018-10-30 at 12:06 -0400, David Turner wrote:
> 
> From the balancer module's code for v 12.2.7 I noticed [1] these lines 
> which reference [2] these 2 config options for upmap. You might try 
> using more max iterations or a smaller max deviation to see if you can 
> get a better balance in your cluster. I would try to start with [3] 
> these commands/values and see if it improves your balance and/or 
> allows you to generate a better map.
> 
> [1]
> 
https://github.com/ceph/ceph/blob/v12.2.7/src/pybind/mgr/balancer/module.py#L671-L672
> [2] upmap_max_iterations (default 10)
> upmap_max_deviation (default .01)
> 
> [3] ceph config-key set mgr/balancer/upmap_max_iterations 50 ceph 
> config-key set mgr/balancer/upmap_max_deviation .005
> 
> On Tue, Oct 30, 2018 at 11:14 AM Steve Taylor < 
> steve.taylor@xxxxxxxxxxxxxxxx> wrote:
> 
> I have a Luminous 12.2.7 cluster with 2 EC pools, both using k=8 and 
> m=2. Each pool lives on 20 dedicated OSD hosts with 18 OSDs each.
> Each
> pool has 2048 PGs and is distributed across its 360 OSDs with host 
> failure domains. The OSDs are identical (4TB) and are weighted with 
> default weights (3.73).
> 
> Initially, and not surprisingly, the PG distribution was all over the 
> place with PG counts per OSD ranging from 40 to 83. I enabled the 
> balancer module in upmap mode and let it work its magic, which reduced 
> the range of the per-OSD PG counts to 56-61.
> 
> While 56-61 is obviously a whole lot better than 40-83, with upmap I 
> expected the range to be 56-57. If I run 'ceph balancer optimize 
> <plan>' again to attempt to create a new plan I get 'Error
> EALREADY:
> Unable to find further optimization,or distribution is already 
> perfect.' I set the balancer's max_misplaced value to 1 in case that 
> was preventing further optimization, but I still get the same error.
> 
> I'm sure I'm missing some config option or something that will allow 
> it to do better, but thus far I haven't been able to find anything in 
> the docs, mailing list archives, or balancer source code that helps.
> Any
> ideas?
> 
> 
> Steve Taylor | Senior Software Engineer | StorageCraft Technology 
> Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 |
> 
> If you are not the intended recipient of this message or received it 
> erroneously, please notify the sender and delete it, together with any 
> attachments, and be advised that any dissemination or copying of this 
> message is prohibited.
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com