Geo-replication and Tiering - Solution for Rebalance races

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Some Background:

Geo-replication in a Tiering volume has race issues as changelogs are processed independently in each brick. Due to frequent movement of files between cold/hot tier, geo-replication is prone to races.

Below is one such example:
==================================
Brick1             Brick2
==================================
Create file        (file moved due to rebalance).
                          Data file
                          Delete file
==================================
If Brick2 changelogs processed first followed by Brick1, file may be created.
But, we expect the file to be deleted (as per the last operation)


Solution:

Step 1.

Record all the fop operations in HOT tier and Record only Data/Meta Data
in COLD tier.

Why ?

a. If the file is directly placed in Hot tier , all fops will be
recorded in HOT tier.

b. If  the file is *already* present in Cold tier, and if any fop is
carried out, it creates linkto file in Hot tier.

              Now, operations like UNLINK, RENAME are captured in Hot
tier(by means of linkto file).
   This way, we can get both tier's operation in HOT tier itself.

Step 2.

From gluster volume info, figure out whether the brick is of COLD subvolume.
(This is possible using gluster volume info <tiervol> --xml )

IF so, IGNORE all file ops except DATA and METADATA.


Help from DHT:

Now, We need some help from (tiering)DHT for Step 1.

There is one issue in Step 1, if the file was Created on a COLD subvolume,
We will miss "CREATE" operation in Hot  subvolume.

Now, If the linkto file is created in HOT tier(Hash) (due to lookup alone),
This needs to be informed to changelog xlator, so that it will record it as CREATE.


IIUC, There are multiple places where linkto file is created.
So, this should be done only in case, lookup creates a linkto file in Hot tier(Hash) alone.


Please provide your feedback on this.
Thanks!

Regards,
Saravana

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux