Re: Balancer: uneven OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Oliver

Here is the output of the active mgr log after I toggled balancer off / on, I grep'd out only "balancer" as it was far to verbose (see below). When I look at ceph osd df I see it optimized :)
I would like to understand two things however, why is "prepared 0/10 changes" zero if it actually did something, what in the log can I look for before I toggled that said basically "hey balancer isn't going to work because I still think min-client-compact-level < luminous"

Thanks for helping me in getting this working!



root@hostmonitor1:/var/log/ceph# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
1 hdd 0.00980 0 0 B 0 B 0 B 0 0 0
3 hdd 0.00980 1.00000 10 GiB 5.3 GiB 4.7 GiB 53.25 0.97 150
6 hdd 0.00980 1.00000 10 GiB 5.6 GiB 4.4 GiB 56.07 1.03 150
0 hdd 0.00980 0 0 B 0 B 0 B 0 0 0
5 hdd 0.00980 1.00000 10 GiB 5.7 GiB 4.3 GiB 56.97 1.04 151
7 hdd 0.00980 1.00000 10 GiB 5.2 GiB 4.8 GiB 52.35 0.96 149
2 hdd 0.00980 0 0 B 0 B 0 B 0 0 0
4 hdd 0.00980 1.00000 10 GiB 5.5 GiB 4.5 GiB 55.25 1.01 150
8 hdd 0.00980 1.00000 10 GiB 5.4 GiB 4.6 GiB 54.07 0.99 150
TOTAL 70 GiB 34 GiB 36 GiB 54.66
MIN/MAX VAR: 0.96/1.04 STDDEV: 1.60


2019-05-29 17:06:49.324 7f40ce42a700 0 log_channel(audit) log [DBG] : from='client.11262 192.168.0.12:0/4104979884' entity='client.admin' cmd=[{"prefix": "balancer off", "target": ["mgr", ""]}]: dispatch
2019-05-29 17:06:49.324 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer status'
2019-05-29 17:06:49.324 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer mode'
2019-05-29 17:06:49.324 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer on'
2019-05-29 17:06:49.324 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer off'
2019-05-29 17:06:49.324 7f40cec2b700 1 mgr[balancer] Handling command: '{'prefix': 'balancer off', 'target': ['mgr', '']}'
2019-05-29 17:06:49.388 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/max_misplaced:.50
2019-05-29 17:06:49.388 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/mode:upmap
2019-05-29 17:06:49.539 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/active
2019-05-29 17:06:49.539 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/begin_time
2019-05-29 17:06:49.539 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/end_time
2019-05-29 17:06:49.539 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/sleep_interval
2019-05-29 17:06:54.279 7f40ce42a700 4 mgr.server handle_command prefix=balancer on
2019-05-29 17:06:54.279 7f40ce42a700 0 log_channel(audit) log [DBG] : from='client.11268 192.168.0.12:0/1339099349' entity='client.admin' cmd=[{"prefix": "balancer on", "target": ["mgr", ""]}]: dispatch
2019-05-29 17:06:54.279 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer status'
2019-05-29 17:06:54.279 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer mode'
2019-05-29 17:06:54.279 7f40ce42a700 1 mgr.server handle_command pyc_prefix: 'balancer on'
2019-05-29 17:06:54.279 7f40cec2b700 1 mgr[balancer] Handling command: '{'prefix': 'balancer on', 'target': ['mgr', '']}'
2019-05-29 17:06:54.287 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/active:1
2019-05-29 17:06:54.287 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/max_misplaced:.50
2019-05-29 17:06:54.287 7f40d747a700 4 mgr[py] Loaded module_config entry mgr/balancer/mode:upmap
2019-05-29 17:06:54.299 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/active
2019-05-29 17:06:54.299 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/begin_time
2019-05-29 17:06:54.299 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/end_time
2019-05-29 17:06:54.299 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/sleep_interval
2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] Optimize plan auto_2019-05-29_17:06:54
2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/mode
2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/max_misplaced
2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] Mode upmap, max misplaced 0.500000
2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] do_upmap
2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/upmap_max_iterations
2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr get_config get_config key: mgr/balancer/upmap_max_deviation
2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] pools ['rbd']
2019-05-29 17:06:54.327 7f40cd3e8700 4 mgr[balancer] prepared 0/10 changes


Inactive hide details for Oliver Freyermuth ---05/29/2019 11:59:39 AM---Hi Tarek, Am 29.05.19 um 18:49 schrieb Tarek Zegar:Oliver Freyermuth ---05/29/2019 11:59:39 AM---Hi Tarek, Am 29.05.19 um 18:49 schrieb Tarek Zegar:

From: Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx>
To: Tarek Zegar <tzegar@xxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Date: 05/29/2019 11:59 AM
Subject: [EXTERNAL] Re: Balancer: uneven OSDs





Hi Tarek,

Am 29.05.19 um 18:49 schrieb Tarek Zegar:
> Hi Oliver,
>
> Thank you for the response, I did ensure that min-client-compact-level is indeed Luminous (see below). I have no kernel mapped rbd clients. Ceph versions reports mimic. Also below is the output of ceph balancer status. One thing to note, I did enable the balancer after I already filled the cluster, not from the onset. I had hoped that it wouldn't matter, though your comment "if the compat-level is too old for upmap, you'll only find a small warning about that in the logfiles" leaves me to believe that it will *not* work in doing it this way, please confirm and let me know what message to look for in /var/log/ceph.

it should also work well on existing clusters - we have also used it on a Luminous cluster after it was already half-filled, and it worked well - that's what it was made for ;-).
The only issue we encountered was that the client-compat-level needed to be set to Luminous before enabling the balancer plugin, but since you can always disable and re-enable a plugin,
this is not a "blocker".

Do you see anything in the logs of the active mgr when disabling and re-enabling the balancer plugin?
That's how we initially found the message that we needed to raise the client-compat-level.

Cheers,
Oliver

>
> Thank you!
>
> root@hostadmin:~# ceph balancer status
> {
> "active": true,
> "plans": [],
> "mode": "upmap"
> }
>
>
>
> root@hostadmin:~# ceph features
> {
> "mon": [
> {
> "features": "0x3ffddff8ffacfffb",
> "release": "luminous",
> "num": 3
> }
> ],
> "osd": [
> {
> "features": "0x3ffddff8ffacfffb",
> "release": "luminous",
> "num": 7
> }
> ],
> "client": [
> {
> "features": "0x3ffddff8ffacfffb",
> "release": "luminous",
> "num": 1
> }
> ],
> "mgr": [
> {
> "features": "0x3ffddff8ffacfffb",
> "release": "luminous",
> "num": 3
> }
> ]
> }
>
>
>
>
> Inactive hide details for Oliver Freyermuth ---05/29/2019 11:13:51 AM---Hi Tarek, what's the output of "ceph balancer status"?Oliver Freyermuth ---05/29/2019 11:13:51 AM---Hi Tarek, what's the output of "ceph balancer status"?
>
> From: Oliver Freyermuth <freyermuth@xxxxxxxxxxxxxxxxxx>
> To: ceph-users@xxxxxxxxxxxxxx
> Date: 05/29/2019 11:13 AM
> Subject: [EXTERNAL] Re: Balancer: uneven OSDs
> Sent by: "ceph-users" <ceph-users-bounces@xxxxxxxxxxxxxx>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
> Hi Tarek,
>
> what's the output of "ceph balancer status"?
> In case you are using "upmap" mode, you must make sure to have a min-client-compat-level of at least Luminous:
>
http://docs.ceph.com/docs/mimic/rados/operations/upmap/
> Of course, please be aware that your clients must be recent enough (especially for kernel clients).
>
> Sadly, if the compat-level is too old for upmap, you'll only find a small warning about that in the logfiles,
> but no error on terminal when activating the balancer or any other kind of erroneous / health condition.
>
> Cheers,
> Oliver
>
> Am 29.05.19 um 17:52 schrieb Tarek Zegar:
>> Can anyone help with this? Why can't I optimize this cluster, the pg counts and data distribution is way off.
>> __________________
>>
>> I enabled the balancer plugin and even tried to manually invoke it but it won't allow any changes. Looking at ceph osd df, it's not even at all. Thoughts?
>>
>> root@hostadmin:~# ceph osd df
>> ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
>> 1 hdd 0.00980 0 0 B 0 B 0 B 0 0 0
>> 3 hdd 0.00980 1.00000 10 GiB 8.3 GiB 1.7 GiB 82.83 1.14 156
>> 6 hdd 0.00980 1.00000 10 GiB 8.4 GiB 1.6 GiB 83.77 1.15 144
>> 0 hdd 0.00980 0 0 B 0 B 0 B 0 0 0
>> 5 hdd 0.00980 1.00000 10 GiB 9.0 GiB 1021 MiB 90.03 1.23 159
>> 7 hdd 0.00980 1.00000 10 GiB 7.7 GiB 2.3 GiB 76.57 1.05 141
>> 2 hdd 0.00980 1.00000 10 GiB 5.5 GiB 4.5 GiB 55.42 0.76 90
>> 4 hdd 0.00980 1.00000 10 GiB 5.9 GiB 4.1 GiB 58.78 0.81 99
>> 8 hdd 0.00980 1.00000 10 GiB 6.3 GiB 3.7 GiB 63.12 0.87 111
>> TOTAL 90 GiB 53 GiB 37 GiB 72.93
>> MIN/MAX VAR: 0.76/1.23 STDDEV: 12.67
>>
>>
>> root@hostadmin:~# osdmaptool om --upmap out.txt --upmap-pool rbd
>> osdmaptool: osdmap file 'om'
>> writing upmap command output to: out.txt
>> checking for upmap cleanups
>> upmap, max-count 100, max*deviation 0.01 <---really? It's not even close to 1% across the drives*
>> limiting to pools rbd (1)
>> *no upmaps proposed*
>>
>>
>> ceph balancer optimize myplan
>> Error EALREADY: Unable to find further optimization,or distribution is already perfect
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> /(See attached file: smime.p7s)/_______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux