Hi Eugen, We tried that already. the osd_max_backfills is in 24 and the osd_recovery_max_active is in 20. On Mon, Dec 12, 2022 at 3:47 PM Eugen Block <eblock@xxxxxx> wrote: > Hi, > > there are many threads dicussing recovery throughput, have you tried > any of the solutions? First thing to try is to increase > osd_recovery_max_active and osd_max_backfills. What are the current > values in your cluster? > > > Zitat von Monish Selvaraj <monish@xxxxxxxxxxxxxxx>: > > > Hi, > > > > Our ceph cluster consists of 20 hosts and 240 osds. > > > > We used the erasure-coded pool with cache-pool concept. > > > > Some time back 2 hosts went down and the pg are in a degraded state. We > got > > the 2 hosts back up in some time. After the pg is started recovering but > it > > takes a long time ( months ) . While this was happening we had the > cluster > > with 664.4 M objects and 987 TB data. The recovery status is not changed; > > it remains 88 pgs degraded. > > > > During this period, we increase the pg size from 256 to 512 for the > > data-pool ( erasure-coded pool ). > > > > We also observed (one week ) the recovery to be very slow, the current > > recovery around 750 Mibs. > > > > Is there any way to increase this recovery throughput ? > > > > *Ceph-version : quincy* > > > > [image: image.png] > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx