ceph-mgr balancer getting started

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi ceph-users,

I am trying to figure out how to go about making ceph balancer do its magic, as I have some pretty unbalanced distribution across osd’s currently, both SSD and HDD.

Cluster is 12.2.4 on Ubuntu 16.04.
All OSD’s have been migrated to bluestore.

Specifically, my HDD’s are the main driver of trying to run the balancer, as I have a near full HDD.

ID CLASS WEIGHT  REWEIGHT SIZE  USE   AVAIL  %USE  VAR  PGS
 4   hdd 7.28450  1.00000 7459G 4543G  2916G 60.91 0.91 126
21   hdd 7.28450  1.00000 7459G 4626G  2833G 62.02 0.92 130
 0   hdd 7.28450  1.00000 7459G 4869G  2589G 65.28 0.97 133
 5   hdd 7.28450  1.00000 7459G 4866G  2592G 65.24 0.97 136
14   hdd 7.28450  1.00000 7459G 4829G  2629G 64.75 0.96 138
 8   hdd 7.28450  1.00000 7459G 4829G  2629G 64.75 0.96 139
 7   hdd 7.28450  1.00000 7459G 4959G  2499G 66.49 0.99 141
23   hdd 7.28450  1.00000 7459G 5159G  2299G 69.17 1.03 142
 2   hdd 7.28450  1.00000 7459G 5042G  2416G 67.60 1.01 144
 1   hdd 7.28450  1.00000 7459G 5292G  2167G 70.95 1.06 145
10   hdd 7.28450  1.00000 7459G 5441G  2018G 72.94 1.09 146
19   hdd 7.28450  1.00000 7459G 5125G  2333G 68.72 1.02 146
 9   hdd 7.28450  1.00000 7459G 5123G  2335G 68.69 1.02 146
18   hdd 7.28450  1.00000 7459G 5187G  2271G 69.54 1.04 149
22   hdd 7.28450  1.00000 7459G 5369G  2089G 71.98 1.07 150
12   hdd 7.28450  1.00000 7459G 5375G  2083G 72.07 1.07 152
17   hdd 7.28450  1.00000 7459G 5498G  1961G 73.71 1.10 152
11   hdd 7.28450  1.00000 7459G 5621G  1838G 75.36 1.12 154
15   hdd 7.28450  1.00000 7459G 5576G  1882G 74.76 1.11 154
20   hdd 7.28450  1.00000 7459G 5797G  1661G 77.72 1.16 158
 6   hdd 7.28450  1.00000 7459G 5951G  1508G 79.78 1.19 164
 3   hdd 7.28450  1.00000 7459G 5960G  1499G 79.90 1.19 166
16   hdd 7.28450  1.00000 7459G 6161G  1297G 82.60 1.23 169
13   hdd 7.28450  1.00000 7459G 6678G   780G 89.54 1.33 184

I sorted this on PGS, and you can see that PGs pretty well follow actual disk usage, and since balancer appears to attempt to distribute PGs more perfectly, I should get more even distribution of my usage.
Hopefully that passes the sanity check.

ID CLASS WEIGHT  REWEIGHT SIZE  USE   AVAIL  %USE  VAR  PGS
49   ssd 1.76109  1.00000 1803G  882G   920G 48.96 0.73 205
72   ssd 1.76109  1.00000 1803G  926G   876G 51.38 0.77 217
30   ssd 1.76109  1.00000 1803G  950G   852G 52.73 0.79 222
48   ssd 1.76109  1.00000 1803G  961G   842G 53.29 0.79 225
54   ssd 1.76109  1.00000 1803G  980G   823G 54.36 0.81 230
63   ssd 1.76109  1.00000 1803G  985G   818G 54.62 0.81 230
35   ssd 1.76109  1.00000 1803G  997G   806G 55.30 0.82 233
45   ssd 1.76109  1.00000 1803G 1002G   801G 55.58 0.83 234
67   ssd 1.76109  1.00000 1803G 1004G   799G 55.69 0.83 234
42   ssd 1.76109  1.00000 1803G 1006G   796G 55.84 0.83 235
52   ssd 1.76109  1.00000 1803G 1009G   793G 56.00 0.83 238
61   ssd 1.76109  1.00000 1803G 1014G   789G 56.24 0.84 238
68   ssd 1.76109  1.00000 1803G 1021G   782G 56.62 0.84 238
32   ssd 1.76109  1.00000 1803G 1021G   781G 56.67 0.84 240
65   ssd 1.76109  1.00000 1803G 1024G   778G 56.83 0.85 240
26   ssd 1.76109  1.00000 1803G 1022G   780G 56.72 0.84 241
59   ssd 1.76109  1.00000 1803G 1031G   771G 57.20 0.85 241
47   ssd 1.76109  1.00000 1803G 1035G   767G 57.42 0.86 242
37   ssd 1.76109  1.00000 1803G 1036G   767G 57.46 0.86 243
28   ssd 1.76109  1.00000 1803G 1043G   760G 57.85 0.86 245
40   ssd 1.76109  1.00000 1803G 1047G   755G 58.10 0.87 245
41   ssd 1.76109  1.00000 1803G 1046G   756G 58.06 0.86 245
62   ssd 1.76109  1.00000 1803G 1050G   752G 58.25 0.87 245
39   ssd 1.76109  1.00000 1803G 1051G   751G 58.30 0.87 246
56   ssd 1.76109  1.00000 1803G 1050G   752G 58.27 0.87 246
70   ssd 1.76109  1.00000 1803G 1041G   761G 57.75 0.86 246
73   ssd 1.76109  1.00000 1803G 1057G   746G 58.63 0.87 247
44   ssd 1.76109  1.00000 1803G 1056G   746G 58.58 0.87 248
38   ssd 1.76109  1.00000 1803G 1059G   743G 58.75 0.87 249
51   ssd 1.76109  1.00000 1803G 1063G   739G 58.99 0.88 249
33   ssd 1.76109  1.00000 1803G 1067G   736G 59.18 0.88 250
36   ssd 1.76109  1.00000 1803G 1071G   731G 59.41 0.88 251
55   ssd 1.76109  1.00000 1803G 1066G   737G 59.11 0.88 251
27   ssd 1.76109  1.00000 1803G 1078G   724G 59.81 0.89 252
31   ssd 1.76109  1.00000 1803G 1079G   724G 59.84 0.89 252
69   ssd 1.76109  1.00000 1803G 1075G   727G 59.63 0.89 252
46   ssd 1.76109  1.00000 1803G 1082G   721G 60.00 0.89 253
58   ssd 1.76109  1.00000 1803G 1081G   721G 59.98 0.89 253
66   ssd 1.76109  1.00000 1803G 1081G   722G 59.96 0.89 253
34   ssd 1.76109  1.00000 1803G 1091G   712G 60.52 0.90 255
43   ssd 1.76109  1.00000 1803G 1089G   713G 60.42 0.90 256
64   ssd 1.76109  1.00000 1803G 1097G   705G 60.87 0.91 257
24   ssd 1.76109  1.00000 1803G 1113G   690G 61.72 0.92 260
25   ssd 1.76109  1.00000 1803G 1146G   656G 63.58 0.95 269
29   ssd 1.76109  1.00000 1803G 1146G   656G 63.59 0.95 269
71   ssd 1.76109  1.00000 1803G 1151G   651G 63.88 0.95 269
57   ssd 1.76109  1.00000 1803G 1183G   619G 65.63 0.98 278
60   ssd 1.76109  1.00000 1803G 1183G   620G 65.60 0.98 278
53   ssd 1.76109  1.00000 1803G 1220G   583G 67.67 1.01 286
50   ssd 1.76109  1.00000 1803G 1283G   519G 71.19 1.06 303

The SSD’s are roughly the same in that PG distribution is matching usage, so I don’t expect a bunch of empty PG’s or anything like that.

So looking at the balancer, I tried to create a plan, and execute the plan, however nothing appears to be happening.
I’m assuming I would expect to see backfills take place when it starts re-balancing the PGs (and thus data).

$ ceph balancer eval
current cluster score 0.024025 (lower is better)

$ ceph balancer optimize 180412.plan1

$ ceph balancer status
{
    "active": true,
    "plans": [
        "180412.plan1"
    ],
    "mode": "crush-compat"
}

$ ceph balancer eval 180412.plan1
plan 180412.plan1 final score 0.024025 (lower is better)

$ ceph balancer show 180412.plan1
# starting osdmap epoch 89751
# starting crush version 250
# mode crush-combat

So maybe I’m not giving it specific parameters?

Here is a pastebin dump of ceph balancer dump $plan: https://pastebin.com/S6JwtY5Q

In another ML thread I found someone with more showing for the balancer configs compared to what I have here:
$ ceph config-key dump
{
    "mgr/balancer/active": "1",
    "mgr/balancer/mode": "crush-compat",
    "mgr/influx/hostname": "",
    "mgr/influx/password": "",
    "mgr/influx/username": ""
}

This compared to what someone else had posted in the other thread:
ceph config-key dump
{
   "mgr/balancer/active": "1",
   "mgr/balancer/begin_time": "0830",
   "mgr/balancer/end_time": "1600",
   "mgr/balancer/max_misplaced": "0.01",

   "mgr/balancer/mode": "crush-compat”
}

So I figure there is some less than perfectly documented step that I am missing, and it is not in fact “turn on and forget it” as Sage mentioned in his presentation, at least in the current form.

Appreciate the help,

Reed

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux