Hello list, We have a Ceph cluster (v17.2.6 quincy) with 3 admin nodes and 6 storage nodes, each connected to a JBOD enclosure. Each enclosure houses 28 HDD disks of size 18 TB, totaling 168 OSDs. The pool that houses the majority of the data is erasure-coded (4+2). We have recently had one disk failure, which brought one OSD down: # ceph osd tree | grep down 2 hdd 16.49579 osd.2 down 0 1.00000 This OSD is out of the cluster, but we haven't replaced it physically yet. The problem that we are facing is that the cluster was not in the best shape when this OSD failed. Currently we have the following: ################ cluster: id: <redacted> health: HEALTH_ERR 1026 scrub errors Possible data damage: 18 pgs inconsistent 2122 pgs not deep-scrubbed in time 2122 pgs not scrubbed in time services: mon: 5 daemons, quorum xyz-admin1,xyz-admin2,xyz-osd1,xyz-osd2,xyz-osd3 (age 17M) mgr: xyz-admin2.sipadf(active, since 17M), standbys: xyz-admin1.nwaovh mds: 2/2 daemons up, 2 standby osd: 168 osds: 167 up (since 40h), 167 in (since 6w); 226 remapped pgs data: volumes: 2/2 healthy pools: 9 pools, 2122 pgs objects: 448.54M objects, 1.0 PiB usage: 1.6 PiB used, 1.1 PiB / 2.7 PiB avail pgs: 133905796/2676514497 objects misplaced (5.003%) 1880 active+clean 201 active+remapped+backfilling 23 active+remapped+backfill_wait 16 active+clean+inconsistent 1 active+remapped+inconsistent+backfill_wait 1 active+remapped+inconsistent+backfilling io: recovery: 703 MiB/s, 281 objects/s progress: Global Recovery Event (6w) [=========================...] (remaining: 5d) ################ I have noticed the number of active+clean increasing, and objects misplaced very slowly decreasing. My question is, should I wait until recovery is complete, repair the 18 damaged pg, and then replace the disk? My thinking is that replacing the disk will trigger more backfilling which will slow down the recovering even more. Another question, should I disable scrubbing while the recovery is not finalized? Thank you for any insights you may be able to provide! - Gustavo _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx