On 4/21/21 9:29 AM, Josh Baergen wrote:
Hey Josh,
Thanks for the info!
With respect to reservations, it seems like an oversight that
we don't reserve other shards for backfilling. We reserve all
shards for recovery [0].
Very interesting that there is a reservation difference between
backfill and recovery.
On the other hand, overload from recovery is handled better in
pacific and beyond with mclock-based QoS, which provides much
more effective control of recovery traffic [1][2].
Indeed, I was wondering if mclock was ultimately the answer here,
though I wonder how mclock acts in the case where a source OSD gets
overloaded in the way that I described. Will it throttle backfill too
aggressively, for example, compared to if the reservation was in
place, preventing overload in the first place?
I expect you'd see more backfills proceeding, each at a slower pace,
than if you had the reservations on all replicas. The total backfill
throughput would be about the same, but completing a particular backfill
would take longer.
One more question in this space: Has there ever been discussion about
a back-off mechanism when one of the remote reservations is blocked?
Another issue that we've commonly seen is that a backfill that should
be able to make progress can't because of a backfill_wait that holds
some of its reservations but is waiting for others. Example (with
simplified up/acting sets):
1.1 active+remapped+backfilling [0,2] 0 [0,1] 0
1.2 active+remapped+backfill_wait [3,2] 3 [3,1] 3
1.3 active+remapped+backfill_wait [3,5] 3 [3,4] 3
1.3's backfill could make progress independent of 1.1, but is blocked
behind 1.2 because the latter is holding the local reservation on
osd.3 and is waiting for the remote reservation on osd.2.
Yes, the reservation mechanism is rather complex and intertwined with
the recovery state machine. There was some discussion about this
(including the idea of backoffs) before:
https://marc.info/?t=152095454200002&r=1&w=2
and summarized in this card:
https://trello.com/c/ppJKaJeT/331-osd-refactor-reserver
Josh
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx