We setup a small Ceph cluster about 6 months ago with just 6x 200GB OSDs with one EC 4x2 pool. When we created that pool, we enabled pg_autoscale. The OSDs stayed pretty well balanced. After our developers released a new "feature" that caused the storage to balloon up to over 80%, we added another 6x 200GB OSDs. When we did that, we looked at the number of PGs for that pool, and found that there was only 1 for the rgw.data and rgw.log pools, and "osd pool autoscale-status" doesn't return anything, so it looks like that hasn't been working. The rebalance operation was extremely slow, and wasn't balancing out osd.0, so we bumped up the PGs for the rgw.data pool to 16. All the OSDs except osd.0 balanced out quickly, but that one OSDs utilization keeps climbing, and the number of misplaced objects is increasing, rather than decreasing. We set noscrub and nodeep-scrub so scrubbing wouldn't slow down the process. At this point, I don't want to do any more tuning to this cluster until we can get it back to a healthy state, but it's not fixing itself. I'm open to any ideas. Here's the output of ceph -s: cluster: id: 159d23e4-2a36-11ed-8b6e-fd27d573fa65 health: HEALTH_WARN 1 pools have many more objects per pg than average noscrub,nodeep-scrub flag(s) set 1 backfillfull osd(s) Low space hindering backfill (add storage if this doesn't resolve itself): 12 pgs backfill_toofull 7 pool(s) backfillfull services: mon: 3 daemons, quorum ceph3,ceph5,ceph6 (age 6h) mgr: ceph5.ksxevx(active, since 23h), standbys: ceph4.frkyyl, ceph6.slvpzl osd: 12 osds: 12 up (since 11h), 12 in (since 11h); 12 remapped pgs flags noscrub,nodeep-scrub rgw: 3 daemons active (3 hosts, 1 zones) data: pools: 7 pools, 161 pgs objects: 28.61M objects, 211 GiB usage: 1.5 TiB used, 834 GiB / 2.3 TiB avail pgs: 91779228/171665865 objects misplaced (53.464%) 149 active+clean 12 active+remapped+backfill_toofull io: client: 11 KiB/s rd, 61 KiB/s wr, 11 op/s rd, 27 op/s wr progress: Global Recovery Event (23h) [=========================...] (remaining: 115m) ceph df: --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 2.3 TiB 834 GiB 1.5 TiB 1.5 TiB 65.24 TOTAL 2.3 TiB 834 GiB 1.5 TiB 1.5 TiB 65.24 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL .mgr 1 1 897 KiB 2 2.6 MiB 0.18 479 MiB .rgw.root 2 32 7.1 KiB 18 204 KiB 0.01 479 MiB charlotte.rgw.log 3 32 27 KiB 347 2.0 MiB 0.14 479 MiB charlotte.rgw.control 4 32 0 B 9 0 B 0 479 MiB charlotte.rgw.meta 5 32 9.7 KiB 16 167 KiB 0.01 479 MiB charlotte.rgw.buckets.data 6 16 734 GiB 28.61M 1.1 TiB 99.87 958 MiB charlotte.rgw.buckets.index 7 16 16 GiB 691 47 GiB 97.12 479 MiB ceph osd tree: ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 2.34357 root default -3 0.39059 host ceph1 0 ssd 0.19530 osd.0 up 0.89999 1.00000 1 ssd 0.19530 osd.1 up 1.00000 1.00000 -5 0.39059 host ceph2 6 ssd 0.19530 osd.6 up 1.00000 1.00000 7 ssd 0.19530 osd.7 up 1.00000 1.00000 -7 0.39059 host ceph3 2 ssd 0.19530 osd.2 up 1.00000 1.00000 8 ssd 0.19530 osd.8 up 1.00000 1.00000 -9 0.39059 host ceph4 3 ssd 0.19530 osd.3 up 1.00000 1.00000 9 ssd 0.19530 osd.9 up 1.00000 1.00000 -11 0.39059 host ceph5 4 ssd 0.19530 osd.4 up 1.00000 1.00000 10 ssd 0.19530 osd.10 up 1.00000 1.00000 -13 0.39059 host ceph6 5 ssd 0.19530 osd.5 up 1.00000 1.00000 11 ssd 0.19530 osd.11 up 1.00000 1.00000 ceph osd df: ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 0 ssd 0.19530 0.89999 200 GiB 190 GiB 130 GiB 12 GiB 48 GiB 10 GiB 94.94 1.46 52 up 1 ssd 0.19530 1.00000 200 GiB 7.3 GiB 9.8 MiB 6.4 GiB 858 MiB 193 GiB 3.64 0.06 42 up 6 ssd 0.19530 1.00000 200 GiB 148 GiB 97 GiB 14 GiB 38 GiB 52 GiB 74.06 1.14 51 up 7 ssd 0.19530 1.00000 200 GiB 133 GiB 97 GiB 2 KiB 35 GiB 67 GiB 66.47 1.02 43 up 2 ssd 0.19530 1.00000 200 GiB 134 GiB 97 GiB 12 KiB 37 GiB 66 GiB 66.94 1.03 40 up 8 ssd 0.19530 1.00000 200 GiB 136 GiB 97 GiB 2.2 GiB 36 GiB 64 GiB 67.85 1.04 40 up 3 ssd 0.19530 1.00000 200 GiB 134 GiB 97 GiB 4 KiB 37 GiB 66 GiB 66.95 1.03 41 up 9 ssd 0.19530 1.00000 200 GiB 138 GiB 97 GiB 5.2 GiB 36 GiB 62 GiB 69.19 1.06 49 up 4 ssd 0.19530 1.00000 200 GiB 137 GiB 97 GiB 4.3 GiB 36 GiB 63 GiB 68.62 1.05 42 up 10 ssd 0.19530 1.00000 200 GiB 139 GiB 97 GiB 5.5 GiB 36 GiB 61 GiB 69.31 1.06 48 up 5 ssd 0.19530 1.00000 200 GiB 134 GiB 97 GiB 7 KiB 38 GiB 66 GiB 67.13 1.03 34 up 11 ssd 0.19530 1.00000 200 GiB 136 GiB 97 GiB 2.2 GiB 36 GiB 64 GiB 67.80 1.04 49 up TOTAL 2.3 TiB 1.5 TiB 1.1 TiB 52 GiB 414 GiB 834 GiB 65.24 MIN/MAX VAR: 0.06/1.46 STDDEV: 19.95 Thanks in advance if anyone has any suggestions. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx