On 9/2/22 15:55, Oebele Drijfhout wrote:
Hello,
I'm new to Ceph and I recently inherited a 4 node cluster with 32 OSDs and
about 116TB raw space, which shows low available space, which I'm trying to
increase by enabling the balancer and lowering priority for the most-used
OSDs. My questions are: is what I did correct with the current state of the
cluster and can I do more to speed up rebalancing and will we actually make
more space available this way?
Yes. When it's perfectly balanced the average OSD utilization should
approach %RAW USED.
Some background info: Earlier this week MAX AVAIL on the cluster was 0 and
it was only then that we noticed something was wrong. We removed an unused
rbd image (about 3TB) and now we have a little under 1TB in available
space. We are adding about 75GB per day on this cluster.
[xxx@ceph02 ~]$ sudo ceph --cluster xxx df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 116 TiB 47 TiB 69 TiB 69 TiB 59.69
TOTAL 116 TiB 47 TiB 69 TiB 69 TiB 59.69
POOLS:
POOL ID PGS STORED OBJECTS USED
%USED MAX AVAIL
xxx-pool 1 1024 130 B 3 192 KiB 0
992 GiB
yyy_data 6 128 23 TiB 12.08M 69 TiB
95.98 992 GiB
yyy_metadata 7 128 5.6 GiB 2.22M 6.1 GiB
0.21 992 GiB
Cluster status:
[xxx@ceph02 ~]$ sudo ceph --cluster xxx -s
cluster:
id: 91ba1ea6-bfec-4ddb-a8b5-9faf842f22c3
health: HEALTH_WARN
1 backfillfull osd(s)
1 nearfull osd(s)
3 pool(s) backfillfull
Low space hindering backfill (add storage if this doesn't
resolve itself): 6 pgs backfill_toofull
services:
mon: 5 daemons, quorum a,b,c,d,e (age 5d)
mgr: b(active, since 22h), standbys: a, c, d, e
mds: registration_docs:1 {0=b=up:active} 3 up:standby
osd: 32 osds: 32 up (since 19M), 32 in (since 3y); 36 remapped pgs
task status:
scrub status:
mds.b: idle
data:
pools: 3 pools, 1280 pgs
objects: 14.31M objects, 23 TiB
usage: 70 TiB used, 47 TiB / 116 TiB avail
pgs: 2587772/42925071 objects misplaced (6.029%)
1244 active+clean
17 active+remapped+backfilling
13 active+remapped+backfill_wait
4 active+remapped+backfill_toofull
2 active+remapped+backfill_wait+backfill_toofull
io:
client: 331 KiB/s wr, 0 op/s rd, 0 op/s wr
recovery: 141 MiB/s, 65 keys/s, 84 objects/s
Versions:
[xxx@ceph02 ~]$ rpm -qa | grep ceph
ceph-common-14.2.13-0.el7.x86_64
ceph-mds-14.2.13-0.el7.x86_64
ceph-osd-14.2.13-0.el7.x86_64
ceph-base-14.2.13-0.el7.x86_64
libcephfs2-14.2.13-0.el7.x86_64
python-ceph-argparse-14.2.13-0.el7.x86_64
ceph-selinux-14.2.13-0.el7.x86_64
ceph-mgr-14.2.13-0.el7.x86_64
ceph-14.2.13-0.el7.x86_64
python-cephfs-14.2.13-0.el7.x86_64
ceph-mon-14.2.13-0.el7.x86_64
It looks like the cluster is severely unbalanced and I guess that's
expected because the balancer was set to "off"
[xxx@ceph02 ~]$ sudo ceph --cluster xxx osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL
%USE VAR PGS STATUS
0 hdd 3.63869 1.00000 3.6 TiB 1.8 TiB 1.8 TiB 894 MiB 3.6 GiB 1.8 TiB
49.24 0.82 122 up
1 hdd 3.63869 1.00000 3.6 TiB 1.1 TiB 1.1 TiB 581 MiB 2.4 GiB 2.5 TiB
31.07 0.52 123 up
2 hdd 3.63869 1.00000 3.6 TiB 2.2 TiB 2.2 TiB 632 MiB 4.1 GiB 1.5 TiB
60.01 1.00 121 up
3 hdd 3.63869 1.00000 3.6 TiB 2.7 TiB 2.7 TiB 672 MiB 5.5 GiB 975 GiB
73.84 1.24 122 up
4 hdd 3.63869 0.94983 3.6 TiB 2.9 TiB 2.9 TiB 478 MiB 5.2 GiB 794 GiB
78.69 1.32 111 up
5 hdd 3.63869 1.00000 3.6 TiB 2.5 TiB 2.5 TiB 900 MiB 4.7 GiB 1.1 TiB
69.52 1.16 122 up
6 hdd 3.63869 1.00000 3.6 TiB 2.7 TiB 2.7 TiB 468 MiB 5.5 GiB 929 GiB
75.08 1.26 125 up
7 hdd 3.63869 1.00000 3.6 TiB 1.6 TiB 1.6 TiB 731 MiB 3.2 GiB 2.0 TiB
44.54 0.75 122 up
8 hdd 3.63869 1.00000 3.6 TiB 1.3 TiB 1.3 TiB 626 MiB 2.6 GiB 2.4 TiB
35.41 0.59 120 up
9 hdd 3.63869 1.00000 3.6 TiB 2.5 TiB 2.5 TiB 953 MiB 4.8 GiB 1.1 TiB
69.61 1.17 122 up
10 hdd 3.63869 1.00000 3.6 TiB 2.0 TiB 2.0 TiB 526 MiB 3.9 GiB 1.6 TiB
55.64 0.93 121 up
11 hdd 3.63869 0.94983 3.6 TiB 3.4 TiB 3.4 TiB 476 MiB 6.2 GiB 242 GiB
93.50 1.57 101 up
12 hdd 3.63869 1.00000 3.6 TiB 1.4 TiB 1.4 TiB 688 MiB 3.0 GiB 2.2 TiB
39.44 0.66 117 up
13 hdd 3.63869 1.00000 3.6 TiB 1.3 TiB 1.3 TiB 738 MiB 2.8 GiB 2.3 TiB
35.98 0.60 124 up
14 hdd 3.63869 1.00000 3.6 TiB 2.8 TiB 2.8 TiB 582 MiB 5.1 GiB 879 GiB
76.40 1.28 123 up
15 hdd 3.63869 1.00000 3.6 TiB 2.5 TiB 2.5 TiB 566 MiB 4.6 GiB 1.1 TiB
68.81 1.15 124 up
16 hdd 3.63869 1.00000 3.6 TiB 1.5 TiB 1.5 TiB 625 MiB 3.1 GiB 2.2 TiB
40.23 0.67 121 up
17 hdd 3.63869 0.94983 3.6 TiB 3.2 TiB 3.2 TiB 704 MiB 6.1 GiB 427 GiB
88.55 1.48 112 up
18 hdd 3.63869 1.00000 3.6 TiB 2.0 TiB 2.0 TiB 143 MiB 3.6 GiB 1.7 TiB
54.12 0.91 124 up
19 hdd 3.63869 1.00000 3.6 TiB 2.7 TiB 2.7 TiB 522 MiB 5.0 GiB 977 GiB
73.79 1.24 126 up
20 hdd 3.63869 1.00000 3.6 TiB 2.4 TiB 2.4 TiB 793 MiB 4.5 GiB 1.2 TiB
66.79 1.12 119 up
21 hdd 3.63869 1.00000 3.6 TiB 1.8 TiB 1.8 TiB 609 MiB 3.6 GiB 1.8 TiB
49.50 0.83 122 up
22 hdd 3.63869 1.00000 3.6 TiB 2.7 TiB 2.7 TiB 600 MiB 5.0 GiB 979 GiB
73.73 1.23 122 up
23 hdd 3.63869 1.00000 3.6 TiB 953 GiB 950 GiB 579 MiB 2.4 GiB 2.7 TiB
25.57 0.43 118 up
24 hdd 3.63869 1.00000 3.6 TiB 1.8 TiB 1.8 TiB 491 MiB 3.4 GiB 1.8 TiB
49.82 0.83 121 up
25 hdd 3.63869 1.00000 3.6 TiB 2.1 TiB 2.1 TiB 836 MiB 4.5 GiB 1.5 TiB
59.07 0.99 121 up
26 hdd 3.63869 0.94983 3.6 TiB 2.9 TiB 2.9 TiB 467 MiB 5.2 GiB 794 GiB
78.69 1.32 104 up
27 hdd 3.63869 1.00000 3.6 TiB 2.0 TiB 2.0 TiB 861 MiB 3.8 GiB 1.7 TiB
54.09 0.91 123 up
28 hdd 3.63869 1.00000 3.6 TiB 1.9 TiB 1.9 TiB 262 MiB 3.6 GiB 1.8 TiB
51.00 0.85 121 up
29 hdd 3.63869 1.00000 3.6 TiB 2.7 TiB 2.7 TiB 998 MiB 5.1 GiB 937 GiB
74.86 1.25 123 up
30 hdd 3.63869 1.00000 3.6 TiB 1.8 TiB 1.8 TiB 1.1 GiB 3.6 GiB 1.8 TiB
50.77 0.85 122 up
31 hdd 3.63869 1.00000 3.6 TiB 2.3 TiB 2.3 TiB 689 MiB 4.4 GiB 1.3 TiB
63.96 1.07 121 up
TOTAL 116 TiB 70 TiB 69 TiB 20 GiB 134 GiB 47 TiB
59.73
MIN/MAX VAR: 0.43/1.57 STDDEV: 16.78
I enabled the balancer this morning, about 4 hours ago:
[xxx@ceph02 ~]$ sudo ceph --cluster xxx balancer status
{
"last_optimize_duration": "0:00:01.119296",
"plans": [],
"mode": "crush-compat",
"active": true,
"optimize_result": "Optimization plan created successfully",
"last_optimize_started": "Fri Sep 2 14:57:03 2022"
}
...and lowered priority to 0.85 for the most-used OSDs (however it looks
like it's slowly reverting to 1?):
There are two modes: compact and upmap. When you have a modern cluster
(luminous and newer) and relatively recent clients, you can switch to
upmap. Your clients should support upmap. Using upmaps is a more
efficient way to balance the cluster. While the ceph balancer will
definitely optimize the cluster, Jonas Jelten's ceph-balancer is
currently doing a better (and more efficient) job:
https://github.com/TheJJ/ceph-balancer.
If you want to improve backfill / recovery speed you can increase amount
of those operations:
ceph tell 'osd.*' injectargs '--osd_max_backfills 3'
ceph tell 'osd.*' injectargs '--osd_recovery_max_active 3'
It might impact clients, so slowly increase these parameters and see how
it's going.
You might want to decrease "osd_recovery_sleep_hdd" as well: check
current value for osd.0
ceph daemon osd.0 config get osd_recovery_sleep_hdd
Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx