Re: Very slow backfilling/remapping of EC pool PGs

Joachim Kraftmayer <joachim.kraftmayer@xxxxxxxxx> · Tue, 21 Mar 2023 10:13:39 +0100

Which Ceph version are you running, is mclock active?

Joachim

___________________________________
Clyso GmbH - Ceph Foundation Member

Am 21.03.23 um 06:53 schrieb Gauvain Pocentek:
Hello all,

We have an EC (4+2) pool for RGW data, with HDDs + SSDs for WAL/DB. This
pool has 9 servers with each 12 disks of 16TBs. About 10 days ago we lost a
server and we've removed its OSDs from the cluster. Ceph has started to
remap and backfill as expected, but the process has been getting slower and
slower. Today the recovery rate is around 12 MiB/s and 10 objects/s. All
the remaining unclean PGs are backfilling:

   data:
     volumes: 1/1 healthy
     pools:   14 pools, 14497 pgs
     objects: 192.38M objects, 380 TiB
     usage:   764 TiB used, 1.3 PiB / 2.1 PiB avail
     pgs:     771559/1065561630 objects degraded (0.072%)
              1215899/1065561630 objects misplaced (0.114%)
              14428 active+clean
              50    active+undersized+degraded+remapped+backfilling
              18    active+remapped+backfilling
              1     active+clean+scrubbing+deep

We've checked the health of the remaining servers, and everything looks
like (CPU/RAM/network/disks).

Any hints on what could be happening?

Thank you,
Gauvain
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx