Hey Josh, Thanks for the info! > With respect to reservations, it seems like an oversight that > we don't reserve other shards for backfilling. We reserve all > shards for recovery [0]. Very interesting that there is a reservation difference between backfill and recovery. > On the other hand, overload from recovery is handled better in > pacific and beyond with mclock-based QoS, which provides much > more effective control of recovery traffic [1][2]. Indeed, I was wondering if mclock was ultimately the answer here, though I wonder how mclock acts in the case where a source OSD gets overloaded in the way that I described. Will it throttle backfill too aggressively, for example, compared to if the reservation was in place, preventing overload in the first place? One more question in this space: Has there ever been discussion about a back-off mechanism when one of the remote reservations is blocked? Another issue that we've commonly seen is that a backfill that should be able to make progress can't because of a backfill_wait that holds some of its reservations but is waiting for others. Example (with simplified up/acting sets): 1.1 active+remapped+backfilling [0,2] 0 [0,1] 0 1.2 active+remapped+backfill_wait [3,2] 3 [3,1] 3 1.3 active+remapped+backfill_wait [3,5] 3 [3,4] 3 1.3's backfill could make progress independent of 1.1, but is blocked behind 1.2 because the latter is holding the local reservation on osd.3 and is waiting for the remote reservation on osd.2. Josh _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx