Peering algorithm questions

Balázs Kossovics <kossovics@xxxxxxxxx> · Tue, 29 Sep 2015 07:08:04 +0000

Hey!

I'm trying to understand the peering algorithm based on [1] and [2]. There are things that aren't really clear or I'm not entirely sure if I understood them correctly, so I'd like to ask some clarification on the points below:

1, Is it right, that the primary writes the operations to the PG log immediately upon its reception?

2, Is it possible that an operation is persisted, but never acknowledged? Imagine this situation: a write arrives to an object, the operation is copied to and get written to the journal by the replicas, but the primary OSD dies and never recovers before it could acknowledge to the user. Upon the next peering, this operations will make part of the authoritative history?

3, Quote from the second step of the peering algorithm: "generate a list of past intervals since last epoch started"
If there was no peering failure, than there is exactly one past interval?

4, Quote from the same step: "the subset for which peering could have completed before the acting set changed to another set of OSDs".
The other intervals are ignored, because we can be sure that no write operations were allowed during those?

5, In each moment, the Up set is either equals to, or a strict subset of the Acting set?

6, When does OSDs repeer? Only when an OSD goes from in -> out, or even if an OSD goes down (but not yet marked automatically out)?

7, For what reasons can the peering fail? If the OSD map changes before the peering completes, then it's a failure? If the OSD map doesn't change, then a reason for failure is not being able to contact "at least one OSD from each of past interval‘s acting set"?

8, up_thru: is a per OSD value in the OSD map, which is updated for the primary after successfully agreeing on the authoritative history, but before completing the peering. What about the secondaries? 

Thanks,
Balázs Kossovics

[1] http://docs.ceph.com/docs/master/dev/peering/
[2] http://docs.ceph.com/docs/master/dev/osd_internals/last_epoch_started/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com