Re: About RADOS level replication

Xuehan Xu <xxhdx1985126@xxxxxxxxx> · Thu, 3 Aug 2017 08:57:54 +0800

We didn't consider condition 2 so detailed before the CDM.

I think I was just mistakenly confused about the things we considered
before the CDM and that we discussed during the CDM, and mistakenly
thought we considered some details, which we didn't actually, before
the CDM.

I'm really sorry about this.

Please forgive me. I'm really sorry

On 3 August 2017 at 08:34, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
> Um.... Sorry, I just read my algorithm 17 again, it seems that it
> doesn't have condition 2......
>
> I think I just got things confused, it was 04:00 AM and I was really
> sleepy then. Please forgive me.
>
> I'll add this into that algorithm. Really sorry.
>
> On 3 August 2017 at 04:05, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
>> Hi, Sage and Joao
>>
>> I think there were something I didn't make clear just now, sorry.
>>
>> About the second issue, by "insuring consistency of cross-object
>> operations", I mean:
>>
>>      Say, a rbd write X involves two rados object A and B, and turn
>> them into A1 and B1 respectively, and a following rbd write Y turn
>> them in to A2 and B2, when these two operations are replicated, the
>> result on the backup cluster can't be A1, B2 or A2, B1.
>>
>> So, just like Joao said, this is all about ORDERING.
>>
>> My approach is like this: since RADOS can guarantee the order of OPs
>> coming from the same client and targeting the same object, we make
>> "repop"s within the same rbd operation forwarded to the same
>> intermediate node, and intermediate node forward these "repop"s to the
>> backup cluster on two conditions: 1) all "repop"s within the same rbd
>> operation arrived at the intermediate node; 2) all rbd operations, the
>> id of which are less than that of the current rbd operation(the id is
>> a monotonously increasing integer that uniquely identifies a rbd
>> operation, the order of the id indicates the order they are created by
>> the librbd client), are sent to the backup cluster(not replicated, or
>> "ondisk" on the backup cluster).
>>
>> With these two constrains, I think we can insure the order of rbd
>> operations. The first condition makes sure that all "repop"s are all
>> sent to the backup cluster, or none of them are sent, which can insure
>> the consistency of the resulting rbd image if the master cluster
>> crashes when only part of a rbd operation are forwarded to the
>> intermediate node. The second condition can make sure that rbd
>> operations are replicated to the backup cluster in the order that they
>> are created by librbd clients. And since RADOS can guarantee the order
>> of OPs coming from the same client and targeting the same object, we
>> don't have to replicate a rbd operation after its ancestor is
>> replicated, only after it's issued to the backup cluster is enough. I
>> think this should be able to preserve the throughput of the
>> inter-cluster replication procedure.
>>
>> The detail of this approach is shown in Algorithm 17.
>>
>> And since we are implementing this at the RADOS level, we shouldn't
>> directly process "rbd" operations here. So, I think we should involve
>> the concept "object set" to adapt to the concepts of the upper level
>> system like "rbd image".
>>
>> I don't know if I'm considering this in the right way, and I'm looking
>> forward to your opinion. Thanks very much:-)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html