Ceph cluster out of balance after adding OSDs

Pat Vaughan <pavaughan@xxxxxxxxx> · Mon, 27 Mar 2023 10:04:35 -0400

We setup a small Ceph cluster about 6 months ago with just 6x 200GB OSDs
with one EC 4x2 pool. When we created that pool, we enabled pg_autoscale.
The OSDs stayed pretty well balanced.

After our developers released a new "feature" that caused the storage to
balloon up to over 80%, we added another 6x 200GB OSDs. When we did that,
we looked at the number of PGs for that pool, and found that there was only
1 for the rgw.data and rgw.log pools, and "osd pool autoscale-status"
doesn't return anything, so it looks like that hasn't been working. The
rebalance operation was extremely slow, and wasn't balancing out osd.0, so
we bumped up the PGs for the rgw.data pool to 16. All the OSDs except osd.0
balanced out quickly, but that one OSDs utilization keeps climbing, and the
number of misplaced objects is increasing, rather than decreasing. We set
noscrub and nodeep-scrub so scrubbing wouldn't slow down the process.

At this point, I don't want to do any more tuning to this cluster until we
can get it back to a healthy state, but it's not fixing itself. I'm open to
any ideas.

Here's the output of ceph -s:
  cluster:
    id:     159d23e4-2a36-11ed-8b6e-fd27d573fa65
    health: HEALTH_WARN
            1 pools have many more objects per pg than average
            noscrub,nodeep-scrub flag(s) set
            1 backfillfull osd(s)
            Low space hindering backfill (add storage if this doesn't
resolve itself): 12 pgs backfill_toofull
            7 pool(s) backfillfull

  services:
    mon: 3 daemons, quorum ceph3,ceph5,ceph6 (age 6h)
    mgr: ceph5.ksxevx(active, since 23h), standbys: ceph4.frkyyl,
ceph6.slvpzl
    osd: 12 osds: 12 up (since 11h), 12 in (since 11h); 12 remapped pgs
         flags noscrub,nodeep-scrub
    rgw: 3 daemons active (3 hosts, 1 zones)

  data:
    pools:   7 pools, 161 pgs
    objects: 28.61M objects, 211 GiB
    usage:   1.5 TiB used, 834 GiB / 2.3 TiB avail
    pgs:     91779228/171665865 objects misplaced (53.464%)
             149 active+clean
             12  active+remapped+backfill_toofull

  io:
    client:   11 KiB/s rd, 61 KiB/s wr, 11 op/s rd, 27 op/s wr

  progress:
    Global Recovery Event (23h)
      [=========================...] (remaining: 115m)

ceph df:
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    2.3 TiB  834 GiB  1.5 TiB   1.5 TiB      65.24
TOTAL  2.3 TiB  834 GiB  1.5 TiB   1.5 TiB      65.24

--- POOLS ---
POOL                         ID  PGS   STORED  OBJECTS     USED  %USED  MAX
AVAIL
.mgr                          1    1  897 KiB        2  2.6 MiB   0.18
 479 MiB
.rgw.root                     2   32  7.1 KiB       18  204 KiB   0.01
 479 MiB
charlotte.rgw.log             3   32   27 KiB      347  2.0 MiB   0.14
 479 MiB
charlotte.rgw.control         4   32      0 B        9      0 B      0
 479 MiB
charlotte.rgw.meta            5   32  9.7 KiB       16  167 KiB   0.01
 479 MiB
charlotte.rgw.buckets.data    6   16  734 GiB   28.61M  1.1 TiB  99.87
 958 MiB
charlotte.rgw.buckets.index   7   16   16 GiB      691   47 GiB  97.12
 479 MiB

ceph osd tree:
ID   CLASS  WEIGHT   TYPE NAME       STATUS  REWEIGHT  PRI-AFF
 -1         2.34357  root default
 -3         0.39059      host ceph1
  0    ssd  0.19530          osd.0       up   0.89999  1.00000
  1    ssd  0.19530          osd.1       up   1.00000  1.00000
 -5         0.39059      host ceph2
  6    ssd  0.19530          osd.6       up   1.00000  1.00000
  7    ssd  0.19530          osd.7       up   1.00000  1.00000
 -7         0.39059      host ceph3
  2    ssd  0.19530          osd.2       up   1.00000  1.00000
  8    ssd  0.19530          osd.8       up   1.00000  1.00000
 -9         0.39059      host ceph4
  3    ssd  0.19530          osd.3       up   1.00000  1.00000
  9    ssd  0.19530          osd.9       up   1.00000  1.00000
-11         0.39059      host ceph5
  4    ssd  0.19530          osd.4       up   1.00000  1.00000
 10    ssd  0.19530          osd.10      up   1.00000  1.00000
-13         0.39059      host ceph6
  5    ssd  0.19530          osd.5       up   1.00000  1.00000
 11    ssd  0.19530          osd.11      up   1.00000  1.00000

ceph osd df:
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META
AVAIL    %USE   VAR   PGS  STATUS
 0    ssd  0.19530   0.89999  200 GiB  190 GiB  130 GiB   12 GiB   48 GiB
10 GiB  94.94  1.46   52      up
 1    ssd  0.19530   1.00000  200 GiB  7.3 GiB  9.8 MiB  6.4 GiB  858 MiB
 193 GiB   3.64  0.06   42      up
 6    ssd  0.19530   1.00000  200 GiB  148 GiB   97 GiB   14 GiB   38 GiB
52 GiB  74.06  1.14   51      up
 7    ssd  0.19530   1.00000  200 GiB  133 GiB   97 GiB    2 KiB   35 GiB
67 GiB  66.47  1.02   43      up
 2    ssd  0.19530   1.00000  200 GiB  134 GiB   97 GiB   12 KiB   37 GiB
66 GiB  66.94  1.03   40      up
 8    ssd  0.19530   1.00000  200 GiB  136 GiB   97 GiB  2.2 GiB   36 GiB
64 GiB  67.85  1.04   40      up
 3    ssd  0.19530   1.00000  200 GiB  134 GiB   97 GiB    4 KiB   37 GiB
66 GiB  66.95  1.03   41      up
 9    ssd  0.19530   1.00000  200 GiB  138 GiB   97 GiB  5.2 GiB   36 GiB
62 GiB  69.19  1.06   49      up
 4    ssd  0.19530   1.00000  200 GiB  137 GiB   97 GiB  4.3 GiB   36 GiB
63 GiB  68.62  1.05   42      up
10    ssd  0.19530   1.00000  200 GiB  139 GiB   97 GiB  5.5 GiB   36 GiB
61 GiB  69.31  1.06   48      up
 5    ssd  0.19530   1.00000  200 GiB  134 GiB   97 GiB    7 KiB   38 GiB
66 GiB  67.13  1.03   34      up
11    ssd  0.19530   1.00000  200 GiB  136 GiB   97 GiB  2.2 GiB   36 GiB
64 GiB  67.80  1.04   49      up
                       TOTAL  2.3 TiB  1.5 TiB  1.1 TiB   52 GiB  414 GiB
 834 GiB  65.24
MIN/MAX VAR: 0.06/1.46  STDDEV: 19.95

Thanks in advance if anyone has any suggestions.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx