Hi Tarek, that's good news, glad my hunch was correct :-). Am 29.05.19 um 19:31 schrieb Tarek Zegar: > Hi Oliver > > Here is the output of the active mgr log after I toggled balancer off / on, I grep'd out only "balancer" as it was far to verbose (see below). When I look at ceph osd df I see it optimized :) > I would like to understand two things however, why is "prepared 0/10 changes" zero if it actually did something, what in the log can I look for before I toggled that said basically "hey balancer isn't going to work because I still think min-client-compact-level< luminous" I can sadly not answer the first question, maybe somebody else on the list can - but I can at least answer the second one. Since I did not remember the exact wording of the message we saw October last year, I checked the sources: https://github.com/ceph/ceph/blob/5111f6df16b106e4e7105e88b88c6eeceb770c4f/src/pybind/mgr/balancer/module.py#L420 So you should find something like: min_compat_client "%s" < "luminous", which is required for pg-upmap. Try "ceph osd set-require-min-compat-client luminous" before enabling this mode in the mgr log. So the message by itself is very helpful, it's just very hidden in the mgr logs ;-). The "prepared x/y changes" message is also generated here: https://github.com/ceph/ceph/blob/5111f6df16b106e4e7105e88b88c6eeceb770c4f/src/pybind/mgr/balancer/module.py#L940 but I do not understand why it shows 0 in your case. Maybe somebody else on this list can explain ;-). Cheers, Oliver > > Thanks for helping me in getting this working! > > > > root@hostmonitor1:/var/log/ceph# ceph osd df > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > 1 hdd 0.00980 0 0 B 0 B 0 B 0 0 0 > 3 hdd 0.00980 1.00000 10 GiB 5.3 GiB 4.7 GiB 53.25 0.97 150 > 6 hdd 0.00980 1.00000 10 GiB 5.6 GiB 4.4 GiB 56.07 1.03 150 > 0 hdd 0.00980 0 0 B 0 B 0 B 0 0 0 > 5 hdd 0.00980 1.00000 10 GiB 5.7 GiB 4.3 GiB 56.97 1.04 151 > 7 hdd 0.00980 1.00000 10 GiB 5.2 GiB 4.8 GiB 52.35 0.96 149 > 2 hdd 0.00980 0 0 B 0 B 0 B 0 0 0 > 4 hdd 0.00980 1.00000 10 GiB 5.5 GiB 4.5 GiB 55.25 1.01 150 > 8 hdd 0.00980 1.00000 10 GiB 5.4 GiB 4.6 GiB 54.07 0.99 150 > TOTAL 70 GiB 34 GiB 36 GiB 54.66 > MIN/MAX VAR: 0.96/1.04 STDDEV: 1.60 > > > 2019-05-29 17:06:49.324 7f40ce42a700 0 log_channel(audit) log [DBG] : from='client.11262 192.168.0.12:0/4104979884' entity='client.admin' cmd=[{"prefix": "balancer off", "target": ["mgr", ""]}]: dispatch > *2019-05-29 17:06:49.324 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer status'* > *2019-05-29 17:06:49.324 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer mode'* > *2019-05-29 17:06:49.324 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer on'* > *2019-05-29 17:06:49.324 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer off'* > 2019-05-29 17:06:49.324 7f40cec2b700 1 mgr[balancer] Handling command: '{'prefix': 'balancer off', 'target': ['mgr', '']}' > 2019-05-29 17:06:49.388 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/max_misplaced:.50 > 2019-05-29 17:06:49.388 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/mode:upmap > 2019-05-29 17:06:49.539 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/active > 2019-05-29 17:06:49.539 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/begin_time > 2019-05-29 17:06:49.539 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/end_time > 2019-05-29 17:06:49.539 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/sleep_interval > 2019-05-29 17:06:54.279 7f40ce42a700 4 mgr.server handle_command prefix=balancer on > 2019-05-29 17:06:54.279 7f40ce42a700 0 log_channel(audit) log [DBG] : from='client.11268 192.168.0.12:0/1339099349' entity='client.admin' cmd=[{"prefix": "balancer on", "target": ["mgr", ""]}]: dispatch > *2019-05-29 17:06:54.279 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer status'* > *2019-05-29 17:06:54.279 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer mode'* > *2019-05-29 17:06:54.279 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer on'* > 2019-05-29 17:06:54.279 7f40cec2b700 1 mgr[balancer] Handling command: '{'prefix': 'balancer on', 'target': ['mgr', '']}' > 2019-05-29 17:06:54.287 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/active:1 > 2019-05-29 17:06:54.287 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/max_misplaced:.50 > 2019-05-29 17:06:54.287 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/mode:upmap > 2019-05-29 17:06:54.299 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/active > 2019-05-29 17:06:54.299 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/begin_time > 2019-05-29 17:06:54.299 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/end_time > 2019-05-29 17:06:54.299 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/sleep_interval > *2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] Optimize plan auto_2019-05-29_17:06:54* > 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/mode > 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/max_misplaced > 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] Mode upmap, max misplaced 0.500000 > 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] do_upmap > 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/upmap_max_iterations > 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/upmap_max_deviation > 2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] pools ['rbd'] > *2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] prepared 0/10 changes* > > > Inactive hide details for Oliver Freyermuth ---05/29/2019 11:59:39 AM---Hi Tarek, Am 29.05.19 um 18:49 schrieb Tarek Zegar:Oliver Freyermuth ---05/29/2019 11:59:39 AM---Hi Tarek, Am 29.05.19 um 18:49 schrieb Tarek Zegar: > > From: Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> > To: Tarek Zegar <tzegar@xxxxxxxxxx> > Cc: ceph-users@xxxxxxxxxxxxxx > Date: 05/29/2019 11:59 AM > Subject: [EXTERNAL] Re: Balancer: uneven OSDs > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > Hi Tarek, > > Am 29.05.19 um 18:49 schrieb Tarek Zegar: >> Hi Oliver, >> >> Thank you for the response, I did ensure that min-client-compact-level is indeed Luminous (see below). I have no kernel mapped rbd clients. Ceph versions reports mimic. Also below is the output of ceph balancer status. One thing to note, I did enable the balancer after I already filled the cluster, not from the onset. I had hoped that it wouldn't matter, though your comment "if the compat-level is too old for upmap, you'll only find a small warning about that in the logfiles" leaves me to believe that it will *not* work in doing it this way, please confirm and let me know what message to look for in /var/log/ceph. > > it should also work well on existing clusters - we have also used it on a Luminous cluster after it was already half-filled, and it worked well - that's what it was made for ;-). > The only issue we encountered was that the client-compat-level needed to be set to Luminous before enabling the balancer plugin, but since you can always disable and re-enable a plugin, > this is not a "blocker". > > Do you see anything in the logs of the active mgr when disabling and re-enabling the balancer plugin? > That's how we initially found the message that we needed to raise the client-compat-level. > > Cheers, > Oliver > >> >> Thank you! >> >> root@hostadmin:~# ceph balancer status >> { >> "active": true, >> "plans": [], >> "mode": "upmap" >> } >> >> >> >> root@hostadmin:~# ceph features >> { >> "mon": [ >> { >> "features": "0x3ffddff8ffacfffb", >> "release": "luminous", >> "num": 3 >> } >> ], >> "osd": [ >> { >> "features": "0x3ffddff8ffacfffb", >> "release": "luminous", >> "num": 7 >> } >> ], >> "client": [ >> { >> "features": "0x3ffddff8ffacfffb", >> "release": "luminous", >> "num": 1 >> } >> ], >> "mgr": [ >> { >> "features": "0x3ffddff8ffacfffb", >> "release": "luminous", >> "num": 3 >> } >> ] >> } >> >> >> >> >> Inactive hide details for Oliver Freyermuth ---05/29/2019 11:13:51 AM---Hi Tarek, what's the output of "ceph balancer status"?Oliver Freyermuth ---05/29/2019 11:13:51 AM---Hi Tarek, what's the output of "ceph balancer status"? >> >> From: Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx> >> To: ceph-users@xxxxxxxxxxxxxx >> Date: 05/29/2019 11:13 AM >> Subject: [EXTERNAL] Re: Balancer: uneven OSDs >> Sent by: "ceph-users" <ceph-users-bounces@xxxxxxxxxxxxxx> >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> >> >> >> Hi Tarek, >> >> what's the output of "ceph balancer status"? >> In case you are using "upmap" mode, you must make sure to have a min-client-compat-level of at least Luminous: >> http://docs.ceph.com/docs/mimic/rados/operations/upmap/ >> Of course, please be aware that your clients must be recent enough (especially for kernel clients). >> >> Sadly, if the compat-level is too old for upmap, you'll only find a small warning about that in the logfiles, >> but no error on terminal when activating the balancer or any other kind of erroneous / health condition. >> >> Cheers, >> Oliver >> >> Am 29.05.19 um 17:52 schrieb Tarek Zegar: >>> Can anyone help with this? Why can't I optimize this cluster, the pg counts and data distribution is way off. >>> __________________ >>> >>> I enabled the balancer plugin and even tried to manually invoke it but it won't allow any changes. Looking at ceph osd df, it's not even at all. Thoughts? >>> >>> root@hostadmin:~# ceph osd df >>> ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS >>> 1 hdd 0.00980 0 0 B 0 B 0 B 0 0 0 >>> 3 hdd 0.00980 1.00000 10 GiB 8.3 GiB 1.7 GiB 82.83 1.14 156 >>> 6 hdd 0.00980 1.00000 10 GiB 8.4 GiB 1.6 GiB 83.77 1.15 144 >>> 0 hdd 0.00980 0 0 B 0 B 0 B 0 0 0 >>> 5 hdd 0.00980 1.00000 10 GiB 9.0 GiB 1021 MiB 90.03 1.23 159 >>> 7 hdd 0.00980 1.00000 10 GiB 7.7 GiB 2.3 GiB 76.57 1.05 141 >>> 2 hdd 0.00980 1.00000 10 GiB 5.5 GiB 4.5 GiB 55.42 0.76 90 >>> 4 hdd 0.00980 1.00000 10 GiB 5.9 GiB 4.1 GiB 58.78 0.81 99 >>> 8 hdd 0.00980 1.00000 10 GiB 6.3 GiB 3.7 GiB 63.12 0.87 111 >>> TOTAL 90 GiB 53 GiB 37 GiB 72.93 >>> MIN/MAX VAR: 0.76/1.23 STDDEV: 12.67 >>> >>> >>> root@hostadmin:~# osdmaptool om --upmap out.txt --upmap-pool rbd >>> osdmaptool: osdmap file 'om' >>> writing upmap command output to: out.txt >>> checking for upmap cleanups >>> upmap, max-count 100, max*deviation 0.01 <---really? It's not even close to 1% across the drives* >>> limiting to pools rbd (1) >>> *no upmaps proposed* >>> >>> >>> ceph balancer optimize myplan >>> Error EALREADY: Unable to find further optimization,or distribution is already perfect >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> /(See attached file: smime.p7s)/_______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> > > > >
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com