Tried sending this earlier but it seems the list doesn't like PNGs. dotty or dot -Tpng will make short work of the .dot file I've attached. These are the changes to the Active state of the PG state chart in order to support recovery reservations. This is Important Stuff, so please criticize mercilessly. Here's a prose version: When the PG activates, it determines whether it needs to do recovery. If it does, it grabs its local reservation, then grabs a remote reservation from each replica in order of OSD ID (to prevent deadlock). Once all remotes are reserved, it starts recovering. After recovery, all remote reservations are dropped. If no backfill is necessary, the local reservation is dropped and we jump to Clean. If we need to backfill, we request a remote backfill reservation from the replica. If this reservation is rejected (due to the OSD being too full) we drop our local reservation and wait for a while in NotBackfilling. We then grab our local reservation and try again on the remote reservation. Once we have the remote reservation, we backfill. After Backfilling we drop the local and remote backfill reservation and jump to Clean.
digraph G { Activating -> Clean [label="AllReplicasClean"]; Activating -> LocalReserving [label="DoRecovery"]; LocalReserving -> WaitRemoteRecoveryReserved [label="LocalRecoveryReserved"]; WaitRemoteRecoveryReserved -> WaitRemoteRecoveryReserved [label="RemoteReserved"]; WaitRemoteRecoveryReserved -> Recovering [label="AllRemotesReserved"]; Recovering -> Clean [label="AllReplicasClean"]; Recovering -> WaitRemoteBackfillReserved [label="RequestBackfill"]; WaitRemoteBackfillReserved -> NotBackfilling [label="RemoteReservationRejected"]; NotBackfilling -> WaitLocalBackfillReservation [label="RequestBackfill"]; WaitLocalBackfillReservation -> WaitRemoteBackfillReserved [label="LocalBackfillReserved"]; WaitRemoteBackfillReserved -> Backfilling [label="RemoteBackfillReserved"]; Backfilling -> Clean [label="Backfilled"]; }