I am in the process of expanding our cluster capacity by ~50% and have noticed some unexpected behavior during the backfill and recovery process that I'd like to understand and see if there is a better configuration that will yield a faster and smoother backfill. Pool Information: OSDs: 243 spinning HDDs PGs: 1024 (yes, this is low for our number of disks) I inherited the cluster and it has the following settings which seem to have been done in an attempt to get the cluster to recover quickly: osd_max_backfills: 6 (default is 1) osd_recovery_sleep_hdd: 0.0 (default is 0.1) osd_recovery_max_active_hdd: 9 When watching the PGs recover I am noticing a few things: - All PGs seem to be backfilling at the same time which seems to be in violation of osd_max_backfills. I understand that there should be 6 readers and 6 writers at a time, but I'm seeing a given OSD participate in more than 6 PG backfills. Is an OSD only considered as backfilling if it is not present in both the UP and ACTING groups (e.g. it will have it's data altered)? - Some PGs are recovering at a much slower rate than others (some as little as kilobytes per second) despite the disks being all of a similar speed. Is there some way to dig into why that may be? - In general, the recovery is happening very slowly (between 1 and 5 objects per second per PG). Is it possible the settings above are too aggressive and causing performance degradation due to disk thrashing? - Currently, all misplaced PGs are backfilling, if I were to change some of the settings above (specifically `osd_max_backfills`) would that essentially pause backfilling PGs or will those backfills have to end and then start over when it is done waiting? - Given that all PGs are backfilling simultaneously there is no way to prioritize one PG over another (we have some disks with very high usage that we're trying to reduce). Would reducing those max backfills allow for proper prioritization of PGs with force-backfill? - We have had some OSDs restart during the process and their misplaced object count is now zero but they are incrementing their recovering objects bytes. Is that expected and is there a way to estimate when that will complete? Thanks for the help! -Jonathan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx