Hi everyone. I'm still learning how to run Ceph properly in production. I have a a cluster (Reef 18.2.4) with 10 nodes (8 x 15TB nvme's each). There are prod 2 pools, one for RGW (3 x replica) and one for CephFS (EC 8k2m). It was all fine but one users started store more data I started seeing: 1. Very high number of misplaced PG's. 2. OSD's very unbalanced and getting 90% full ``` ceph -s cluster: id: 7805xxxe-6ba7-11ef-9cda-0xxxcxxx0 health: HEALTH_WARN Low space hindering backfill (add storage if this doesn't resolve itself): 195 pgs backfill_toofull 150 pgs not deep-scrubbed in time 150 pgs not scrubbed in time services: mon: 5 daemons, quorum host01,host02,host03,host04,host05 (age 7w) mgr: host01.bwqkna(active, since 7w), standbys: host02.dycdqe mds: 5/5 daemons up, 6 standby osd: 80 osds: 80 up (since 7w), 80 in (since 4M); 323 remapped pgs rgw: 30 daemons active (10 hosts, 1 zones) data: volumes: 1/1 healthy pools: 11 pools, 1394 pgs objects: 159.65M objects, 279 TiB usage: 696 TiB used, 421 TiB / 1.1 PiB avail pgs: 230137879/647342099 objects misplaced (35.551%) 1033 active+clean 180 active+remapped+backfill_toofull 123 active+remapped+backfill_wait 28 active+clean+scrubbing 15 active+remapped+backfill_wait+backfill_toofull 10 active+clean+scrubbing+deep 5 active+remapped+backfilling io: client: 668 MiB/s rd, 11 MiB/s wr, 1.22k op/s rd, 1.15k op/s wr recovery: 479 MiB/s, 283 objects/s progress: Global Recovery Event (5w) [=====================.......] (remaining: 11d) ``` I've been trying to rebalance the OSD's manually since the balancer does not work due to: ``` "optimize_result": "Too many objects (0.355160 > 0.050000) are misplaced; try again later", ``` I manually re-weighted the top 10 most used OSD's and the number of misplaced objects are going down very slowly. I think it could take many weeks at that ratio. There's almost 40% of total free space but the RGW pool is almost full at ~94% I think because of OSD's unbalancing. ``` ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 1.1 PiB 421 TiB 697 TiB 697 TiB 62.34 TOTAL 1.1 PiB 421 TiB 697 TiB 697 TiB 62.34 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL .mgr 1 1 69 MiB 15 207 MiB 0 13 TiB .nfs 2 32 172 KiB 43 574 KiB 0 13 TiB .rgw.root 3 32 2.7 KiB 6 88 KiB 0 13 TiB default.rgw.log 4 32 2.1 MiB 209 7.0 MiB 0 13 TiB default.rgw.control 5 32 0 B 8 0 B 0 13 TiB default.rgw.meta 6 32 97 KiB 280 3.5 MiB 0 13 TiB default.rgw.buckets.index 7 32 16 GiB 2.41k 47 GiB 0.11 13 TiB default.rgw.buckets.data 10 1024 197 TiB 133.75M 592 TiB 93.69 13 TiB default.rgw.buckets.non-ec 11 32 78 MiB 1.43M 17 GiB 0.04 13 TiB cephfs.cephfs01.data 12 144 83 TiB 23.99M 103 TiB 72.18 32 TiB cephfs.cephfs01.metadata 13 1 952 MiB 483.14k 3.7 GiB 0 10 TiB ``` I also tried changing the following but it does not seem to persist: ``` # ceph-conf --show-config | egrep "osd_recovery_max_active|osd_recovery_op_priority|osd_max_backfills" osd_max_backfills = 1 osd_recovery_max_active = 0 osd_recovery_max_active_hdd = 3 osd_recovery_max_active_ssd = 10 osd_recovery_op_priority = 3 # ceph config set osd osd_max_backfills 10 # ceph-conf --show-config | egrep "osd_recovery_max_active|osd_recovery_op_priority|osd_max_backfills" osd_max_backfills = 1 osd_recovery_max_active = 0 osd_recovery_max_active_hdd = 3 osd_recovery_max_active_ssd = 10 osd_recovery_op_priority = 3 ``` 1. Why I ended up with so many misplaced PG's since there were no changes on the cluster: number of osd's, hosts, etc. 2. Is it ok to change the target_max_misplaced_ratio to something higher than .05 so the autobalancer would work and I wouldn't have to constantly rebalance the osd's manually? 3. Is there a way to speed up the rebalance? 4. Any other recommendation that could help to make my cluster healthy again? Thank you! Bruno _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx