Hello all, We have an EC (4+2) pool for RGW data, with HDDs + SSDs for WAL/DB. This pool has 9 servers with each 12 disks of 16TBs. About 10 days ago we lost a server and we've removed its OSDs from the cluster. Ceph has started to remap and backfill as expected, but the process has been getting slower and slower. Today the recovery rate is around 12 MiB/s and 10 objects/s. All the remaining unclean PGs are backfilling: data: volumes: 1/1 healthy pools: 14 pools, 14497 pgs objects: 192.38M objects, 380 TiB usage: 764 TiB used, 1.3 PiB / 2.1 PiB avail pgs: 771559/1065561630 objects degraded (0.072%) 1215899/1065561630 objects misplaced (0.114%) 14428 active+clean 50 active+undersized+degraded+remapped+backfilling 18 active+remapped+backfilling 1 active+clean+scrubbing+deep We've checked the health of the remaining servers, and everything looks like (CPU/RAM/network/disks). Any hints on what could be happening? Thank you, Gauvain _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx