On Wed, Sep 2, 2015 at 12:42 PM, Aravinda <avishwan@xxxxxxxxxx> wrote: > Geo-replication and Sharding Team today discussed about the approach > to make Sharding aware Geo-replication. Details are as below > > Participants: Aravinda, Kotresh, Krutika, Rahul Hinduja, Vijay Bellur > > - Both Master and Slave Volumes should be Sharded Volumes with same > configurations. > - In Changelog record changes related to Sharded files also. Just like > any regular files. > - Sharding should allow Geo-rep to list/read/write Sharding internal > Xattrs if Client PID is gsyncd(-1) > - Sharding should allow read/write of Sharded files(that is in .shards > directory) if Client PID is GSYNCD > - Sharding should return actual file instead of returning the > aggregated content when the Main file is requested(Client PID > GSYNCD) > > For example, a file f1 is created with GFID G1. > > When the file grows it gets sharded into chunks(say 5 chunks). > > f1 G1 > .shards/G1.1 G2 > .shards/G1.2 G3 > .shards/G1.3 G4 > .shards/G1.4 G5 > > In Changelog, this is recorded as 5 different files as below > > CREATE G1 f1 > DATA G1 > META G1 > CREATE G2 PGS/G1.1 > DATA G2 > META G1 > CREATE G3 PGS/G1.2 > DATA G3 > META G1 > CREATE G4 PGS/G1.3 > DATA G4 > META G1 > CREATE G5 PGS/G1.4 > DATA G5 > META G1 > > Where PGS is GFID of .shards directory. > > Geo-rep will create these files independently in Slave Volume and > syncs Xattrs of G1. Data can be read only when all the chunks are > synced to Slave Volume. Data can be read partially if main/first file > and some of the chunks synced to Slave. So, before replicating data to the salve, all shards needs to be created there? > > Please add if I missed anything. C & S Welcome. > > regards > Aravinda > > On 08/11/2015 04:36 PM, Aravinda wrote: > > Hi, > > We are thinking different approaches to add support in Geo-replication for > Sharded Gluster Volumes[1] > > Approach 1: Geo-rep: Sync Full file > - In Changelog only record main file details in the same brick where it > is created > - Record as DATA in Changelog whenever any addition/changes to the > sharded file > - Geo-rep rsync will do checksum as a full file from mount and syncs as > new file > - Slave side sharding is managed by Slave Volume > > Approach 2: Geo-rep: Sync sharded file separately > - Geo-rep rsync will do checksum for sharded files only > - Geo-rep syncs each sharded files independently as new files > - [UNKNOWN] Sync internal xattrs(file size and block count) in the main > sharded file to Slave Volume to maintain the same state as in Master. > - Sharding translator to allow file creation under .shards dir for > gsyncd. that is Parent GFID is .shards directory > - If sharded files are modified during Geo-rep run may end up stale data > in Slave. > - Files on Slave Volume may not be readable unless all sharded files sync > to Slave(Each bricks in Master independently sync files to slave) > > First approach looks more clean, but we have to analize the Rsync checksum > performance on big files(Sharded in backend, accessed as one big file from > rsync) > > Let us know your thoughts. Thanks > > Ref: > [1] > http://www.gluster.org/community/documentation/index.php/Features/sharding-xlator > > -- > regards > Aravinda > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel