Just to update the case for others: Setting ceph config set osd/class:ssd osd_recovery_sleep 0.001 ceph config set osd/class:hdd osd_recovery_sleep 0.05 had the desired effect. I'm running another massive rebalancing operation right now and these settings seem to help. It would be nice if one could use a pool name in a filter though (osd/pool:NAME). I have 2 different pools on the same SSDs and only objects from one of these pools require the lower sleep setting. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Joachim Kraftmayer <joachim.kraftmayer@xxxxxxxxx> Sent: 03 December 2020 16:49:51 To: 胡 玮文; Frank Schilder Cc: ceph-users@xxxxxxx Subject: Re: Re: Increase number of objects in flight during recovery Hi Frank, this values we used to reduce the recovery impact before luminous. #reduce recovery impact osd max backfills osd recovery max active osd recovery max single start osd recovery op priority osd recovery threads osd backfill scan max osd backfill scan min I do not know how many osds and pgs you have in your cluster. But the backfill performance depends on osds, pgs and objects/pg. Regards, Joachim ___________________________________ Clyso GmbH Am 03.12.2020 um 12:35 schrieb 胡 玮文: > Hi, > > There is a “OSD recovery priority” dialog box in web dashboard. Configurations it will change includes: > > osd_max_backfill > osd_recovery_max_active > osd_recovery_max_single_start > osd_recovery_sleep > > Tune these config may helps. “High” priority corresponding to 4, 4, 4, 0, respectively. Some of these also have a _ssd/_hdd variant. > >> 在 2020年12月3日,17:11,Frank Schilder <frans@xxxxxx> 写道: >> >> Hi all, >> >> I have the opposite problem as discussed in "slow down keys/s in recovery". I need to increase the number of objects in flight during rebalance. It is already all remapped PGs in state backfilling, but it looks like no more than 8 objects/sec are transferred per PG at a time. The pools sits on high-performance SSDs and could easily handle a transfer of 100 or more objects/sec simultaneously. Is there any way to increase the number of transfers/sec or simultaneous transfers? Increasing the options osd_max_backfills and osd_recovery_max_active has no effect. >> >> Background: The pool in question (con-fs2-meta2) is the default data pool of a ceph fs, which stores exclusively the kind of meta data that goes into this pool. Storage consumption is reported as 0, but the number of objects is huge: >> >> NAME ID USED %USED MAX AVAIL OBJECTS >> con-fs2-meta1 12 216 MiB 0.02 933 GiB 13311115 >> con-fs2-meta2 13 0 B 0 933 GiB 118389897 >> con-fs2-data 14 698 TiB 72.15 270 TiB 286826739 >> >> Unfortunately, there were no recommendations on dimensioning PG numbers for this pool, so I used the same for con-fs2-meta1, and con-fs2-meta2. In hindsight, this was potentially a bad idea, the meta2 pool should have a much higher PG count or a much more aggressive recovery policy. >> >> I now need to rebalance PGs on meta2 and it is going way too slow compared with the performance of the SSDs it is located on. In a way, I would like to keep the PG count where it is, but increase the recovery rate for this pool by a factor of 10. Please let me know what options I have. >> >> Best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx