Re: How does rbd-mirror preserve the order of WRITE operations that finished on the primary cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The order of writes from a single client connection to a single OSD is
guaranteed. The rbd-mirror journal replay process handles one event at
a time and does not start processing the next event until the IO has
been started in-flight with librados. Therefore, even though the
replay process allows 50 - 100 IO requests to be in-flight, those IOs
are actually well-ordered in terms of updates to a single object.

Of course, such an IO request from the client application / VM would
be incorrect behavior if they didn't wait for the completion callback
before issuing the second update.

On Mon, Jun 5, 2017 at 12:05 AM, xxhdx1985126 <xxhdx1985126@xxxxxxx> wrote:
> Hi, everyone.
>
>
> Recently, I've been reading the source code of rbd-mirror. I wonder how rbd-mirror preserves the order of WRITE operations that finished on the primary cluster. As far as I can understand the code, rbd-mirror fetches I/O operations from the journal on the primary cluster, and replay them on the slave cluster without checking whether there's already any I/O operations targeting the same object that has been issued to the slave cluster and not yet finished. Since concurrent operations may finish in a different order than that in which they arrived at the OSD, the order that the WRITE operations finish on the slave cluster may be different than that on the primay cluster. For example: on the primary cluster, there are two WRITE operation targeting the same object A which are, in the order they finish on the primary cluster, "WRITE A.off data1" and "WRITE A.off data2"; while when they are replayed on the slave cluster, the order may be "WRITE A.off data2" and "WRITE A.off data1", which means that the result of the two operations on the primary cluster is A.off=data2 while, on the slave cluster, the result is A.off=data1.
>
>
> Is this possible?
>
>
>



-- 
Jason
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux