Re: low available space due to unbalanced cluster(?)

Stefan Kooman <stefan@xxxxxx> · Fri, 2 Sep 2022 17:27:51 +0200

On 9/2/22 15:55, Oebele Drijfhout wrote:
Hello,

I'm new to Ceph and I recently inherited a 4 node cluster with 32 OSDs and
about 116TB raw space, which shows low available space, which I'm trying to
increase by enabling the balancer and lowering priority for the most-used
OSDs. My questions are: is what I did correct with the current state of the
cluster and can I do more to speed up rebalancing and will we actually make
more space available this way?

Yes. When it's perfectly balanced the average OSD utilization should 
approach %RAW USED.

Some background info: Earlier this week MAX AVAIL on the cluster was 0 and
it was only then that we noticed something was wrong. We removed an unused
rbd image (about 3TB) and now we have a little under 1TB in available
space. We are adding about 75GB per day on this cluster.

[xxx@ceph02 ~]$ sudo ceph --cluster xxx df
RAW STORAGE:
     CLASS     SIZE        AVAIL      USED       RAW USED     %RAW USED
     hdd       116 TiB     47 TiB     69 TiB       69 TiB         59.69
     TOTAL     116 TiB     47 TiB     69 TiB       69 TiB         59.69

POOLS:
     POOL            ID     PGS      STORED      OBJECTS     USED
  %USED     MAX AVAIL
     xxx-pool        1      1024     130 B       3           192 KiB      0
       992 GiB
     yyy_data        6      128      23 TiB      12.08M      69 TiB
  95.98    992 GiB
     yyy_metadata    7      128      5.6 GiB     2.22M       6.1 GiB
0.21     992 GiB

Cluster status:

[xxx@ceph02 ~]$ sudo ceph --cluster xxx -s
   cluster:
     id:     91ba1ea6-bfec-4ddb-a8b5-9faf842f22c3
     health: HEALTH_WARN
             1 backfillfull osd(s)
             1 nearfull osd(s)
             3 pool(s) backfillfull
             Low space hindering backfill (add storage if this doesn't
resolve itself): 6 pgs backfill_toofull

   services:
     mon: 5 daemons, quorum a,b,c,d,e (age 5d)
     mgr: b(active, since 22h), standbys: a, c, d, e
     mds: registration_docs:1 {0=b=up:active} 3 up:standby
     osd: 32 osds: 32 up (since 19M), 32 in (since 3y); 36 remapped pgs

   task status:
     scrub status:
         mds.b: idle

   data:
     pools:   3 pools, 1280 pgs
     objects: 14.31M objects, 23 TiB
     usage:   70 TiB used, 47 TiB / 116 TiB avail
     pgs:     2587772/42925071 objects misplaced (6.029%)
              1244 active+clean
              17   active+remapped+backfilling
              13   active+remapped+backfill_wait
              4    active+remapped+backfill_toofull
              2    active+remapped+backfill_wait+backfill_toofull

   io:
     client:   331 KiB/s wr, 0 op/s rd, 0 op/s wr
     recovery: 141 MiB/s, 65 keys/s, 84 objects/s

Versions:

[xxx@ceph02 ~]$ rpm -qa | grep ceph
ceph-common-14.2.13-0.el7.x86_64
ceph-mds-14.2.13-0.el7.x86_64
ceph-osd-14.2.13-0.el7.x86_64
ceph-base-14.2.13-0.el7.x86_64
libcephfs2-14.2.13-0.el7.x86_64
python-ceph-argparse-14.2.13-0.el7.x86_64
ceph-selinux-14.2.13-0.el7.x86_64
ceph-mgr-14.2.13-0.el7.x86_64
ceph-14.2.13-0.el7.x86_64
python-cephfs-14.2.13-0.el7.x86_64
ceph-mon-14.2.13-0.el7.x86_64

It looks like the cluster is severely unbalanced and I guess that's
expected because the balancer was set to "off"

[xxx@ceph02 ~]$ sudo ceph --cluster xxx osd df
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META    AVAIL
%USE  VAR  PGS STATUS
  0   hdd 3.63869  1.00000 3.6 TiB 1.8 TiB 1.8 TiB 894 MiB 3.6 GiB 1.8 TiB
49.24 0.82 122     up
  1   hdd 3.63869  1.00000 3.6 TiB 1.1 TiB 1.1 TiB 581 MiB 2.4 GiB 2.5 TiB
31.07 0.52 123     up
  2   hdd 3.63869  1.00000 3.6 TiB 2.2 TiB 2.2 TiB 632 MiB 4.1 GiB 1.5 TiB
60.01 1.00 121     up
  3   hdd 3.63869  1.00000 3.6 TiB 2.7 TiB 2.7 TiB 672 MiB 5.5 GiB 975 GiB
73.84 1.24 122     up
  4   hdd 3.63869  0.94983 3.6 TiB 2.9 TiB 2.9 TiB 478 MiB 5.2 GiB 794 GiB
78.69 1.32 111     up
  5   hdd 3.63869  1.00000 3.6 TiB 2.5 TiB 2.5 TiB 900 MiB 4.7 GiB 1.1 TiB
69.52 1.16 122     up
  6   hdd 3.63869  1.00000 3.6 TiB 2.7 TiB 2.7 TiB 468 MiB 5.5 GiB 929 GiB
75.08 1.26 125     up
  7   hdd 3.63869  1.00000 3.6 TiB 1.6 TiB 1.6 TiB 731 MiB 3.2 GiB 2.0 TiB
44.54 0.75 122     up
  8   hdd 3.63869  1.00000 3.6 TiB 1.3 TiB 1.3 TiB 626 MiB 2.6 GiB 2.4 TiB
35.41 0.59 120     up
  9   hdd 3.63869  1.00000 3.6 TiB 2.5 TiB 2.5 TiB 953 MiB 4.8 GiB 1.1 TiB
69.61 1.17 122     up
10   hdd 3.63869  1.00000 3.6 TiB 2.0 TiB 2.0 TiB 526 MiB 3.9 GiB 1.6 TiB
55.64 0.93 121     up
11   hdd 3.63869  0.94983 3.6 TiB 3.4 TiB 3.4 TiB 476 MiB 6.2 GiB 242 GiB
93.50 1.57 101     up
12   hdd 3.63869  1.00000 3.6 TiB 1.4 TiB 1.4 TiB 688 MiB 3.0 GiB 2.2 TiB
39.44 0.66 117     up
13   hdd 3.63869  1.00000 3.6 TiB 1.3 TiB 1.3 TiB 738 MiB 2.8 GiB 2.3 TiB
35.98 0.60 124     up
14   hdd 3.63869  1.00000 3.6 TiB 2.8 TiB 2.8 TiB 582 MiB 5.1 GiB 879 GiB
76.40 1.28 123     up
15   hdd 3.63869  1.00000 3.6 TiB 2.5 TiB 2.5 TiB 566 MiB 4.6 GiB 1.1 TiB
68.81 1.15 124     up
16   hdd 3.63869  1.00000 3.6 TiB 1.5 TiB 1.5 TiB 625 MiB 3.1 GiB 2.2 TiB
40.23 0.67 121     up
17   hdd 3.63869  0.94983 3.6 TiB 3.2 TiB 3.2 TiB 704 MiB 6.1 GiB 427 GiB
88.55 1.48 112     up
18   hdd 3.63869  1.00000 3.6 TiB 2.0 TiB 2.0 TiB 143 MiB 3.6 GiB 1.7 TiB
54.12 0.91 124     up
19   hdd 3.63869  1.00000 3.6 TiB 2.7 TiB 2.7 TiB 522 MiB 5.0 GiB 977 GiB
73.79 1.24 126     up
20   hdd 3.63869  1.00000 3.6 TiB 2.4 TiB 2.4 TiB 793 MiB 4.5 GiB 1.2 TiB
66.79 1.12 119     up
21   hdd 3.63869  1.00000 3.6 TiB 1.8 TiB 1.8 TiB 609 MiB 3.6 GiB 1.8 TiB
49.50 0.83 122     up
22   hdd 3.63869  1.00000 3.6 TiB 2.7 TiB 2.7 TiB 600 MiB 5.0 GiB 979 GiB
73.73 1.23 122     up
23   hdd 3.63869  1.00000 3.6 TiB 953 GiB 950 GiB 579 MiB 2.4 GiB 2.7 TiB
25.57 0.43 118     up
24   hdd 3.63869  1.00000 3.6 TiB 1.8 TiB 1.8 TiB 491 MiB 3.4 GiB 1.8 TiB
49.82 0.83 121     up
25   hdd 3.63869  1.00000 3.6 TiB 2.1 TiB 2.1 TiB 836 MiB 4.5 GiB 1.5 TiB
59.07 0.99 121     up
26   hdd 3.63869  0.94983 3.6 TiB 2.9 TiB 2.9 TiB 467 MiB 5.2 GiB 794 GiB
78.69 1.32 104     up
27   hdd 3.63869  1.00000 3.6 TiB 2.0 TiB 2.0 TiB 861 MiB 3.8 GiB 1.7 TiB
54.09 0.91 123     up
28   hdd 3.63869  1.00000 3.6 TiB 1.9 TiB 1.9 TiB 262 MiB 3.6 GiB 1.8 TiB
51.00 0.85 121     up
29   hdd 3.63869  1.00000 3.6 TiB 2.7 TiB 2.7 TiB 998 MiB 5.1 GiB 937 GiB
74.86 1.25 123     up
30   hdd 3.63869  1.00000 3.6 TiB 1.8 TiB 1.8 TiB 1.1 GiB 3.6 GiB 1.8 TiB
50.77 0.85 122     up
31   hdd 3.63869  1.00000 3.6 TiB 2.3 TiB 2.3 TiB 689 MiB 4.4 GiB 1.3 TiB
63.96 1.07 121     up
                     TOTAL 116 TiB  70 TiB  69 TiB  20 GiB 134 GiB  47 TiB
59.73
MIN/MAX VAR: 0.43/1.57  STDDEV: 16.78

I enabled the balancer this morning, about 4 hours ago:

[xxx@ceph02 ~]$ sudo ceph --cluster xxx balancer status
{
     "last_optimize_duration": "0:00:01.119296",
     "plans": [],
     "mode": "crush-compat",
     "active": true,
     "optimize_result": "Optimization plan created successfully",
     "last_optimize_started": "Fri Sep  2 14:57:03 2022"
}

  ...and lowered priority to 0.85 for the most-used OSDs (however it looks
like it's slowly reverting to 1?):

There are two modes: compact and upmap. When you have a modern cluster 
(luminous and newer) and relatively recent clients, you can switch to 
upmap. Your clients should support upmap. Using upmaps is a more 
efficient way to balance the cluster. While the ceph balancer will 
definitely optimize the cluster, Jonas Jelten's ceph-balancer is 
currently doing a better (and more efficient) job: 
https://github.com/TheJJ/ceph-balancer.

If you want to improve backfill / recovery speed you can increase amount 
of those operations:

ceph tell 'osd.*' injectargs '--osd_max_backfills 3'
ceph tell 'osd.*' injectargs '--osd_recovery_max_active 3'

It might impact clients, so slowly increase these parameters and see how 
it's going.

You might want to decrease "osd_recovery_sleep_hdd" as well: check 
current value for osd.0

ceph daemon osd.0 config get osd_recovery_sleep_hdd

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx