By the way, the advice that you give during the CDM was very important, we'll adjust our plan based on those advice. Thanks:-) And apologize again. On 3 August 2017 at 08:57, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote: > We didn't consider condition 2 so detailed before the CDM. > > I think I was just mistakenly confused about the things we considered > before the CDM and that we discussed during the CDM, and mistakenly > thought we considered some details, which we didn't actually, before > the CDM. > > I'm really sorry about this. > > Please forgive me. I'm really sorry > > On 3 August 2017 at 08:34, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote: >> Um.... Sorry, I just read my algorithm 17 again, it seems that it >> doesn't have condition 2...... >> >> I think I just got things confused, it was 04:00 AM and I was really >> sleepy then. Please forgive me. >> >> I'll add this into that algorithm. Really sorry. >> >> On 3 August 2017 at 04:05, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote: >>> Hi, Sage and Joao >>> >>> I think there were something I didn't make clear just now, sorry. >>> >>> About the second issue, by "insuring consistency of cross-object >>> operations", I mean: >>> >>> Say, a rbd write X involves two rados object A and B, and turn >>> them into A1 and B1 respectively, and a following rbd write Y turn >>> them in to A2 and B2, when these two operations are replicated, the >>> result on the backup cluster can't be A1, B2 or A2, B1. >>> >>> So, just like Joao said, this is all about ORDERING. >>> >>> My approach is like this: since RADOS can guarantee the order of OPs >>> coming from the same client and targeting the same object, we make >>> "repop"s within the same rbd operation forwarded to the same >>> intermediate node, and intermediate node forward these "repop"s to the >>> backup cluster on two conditions: 1) all "repop"s within the same rbd >>> operation arrived at the intermediate node; 2) all rbd operations, the >>> id of which are less than that of the current rbd operation(the id is >>> a monotonously increasing integer that uniquely identifies a rbd >>> operation, the order of the id indicates the order they are created by >>> the librbd client), are sent to the backup cluster(not replicated, or >>> "ondisk" on the backup cluster). >>> >>> With these two constrains, I think we can insure the order of rbd >>> operations. The first condition makes sure that all "repop"s are all >>> sent to the backup cluster, or none of them are sent, which can insure >>> the consistency of the resulting rbd image if the master cluster >>> crashes when only part of a rbd operation are forwarded to the >>> intermediate node. The second condition can make sure that rbd >>> operations are replicated to the backup cluster in the order that they >>> are created by librbd clients. And since RADOS can guarantee the order >>> of OPs coming from the same client and targeting the same object, we >>> don't have to replicate a rbd operation after its ancestor is >>> replicated, only after it's issued to the backup cluster is enough. I >>> think this should be able to preserve the throughput of the >>> inter-cluster replication procedure. >>> >>> The detail of this approach is shown in Algorithm 17. >>> >>> And since we are implementing this at the RADOS level, we shouldn't >>> directly process "rbd" operations here. So, I think we should involve >>> the concept "object set" to adapt to the concepts of the upper level >>> system like "rbd image". >>> >>> I don't know if I'm considering this in the right way, and I'm looking >>> forward to your opinion. Thanks very much:-) -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html