Re: Gluster Sharding and Geo-replication

Aravinda <avishwan@xxxxxxxxxx> · Thu, 3 Sep 2015 11:55:36 +0530

On 09/03/2015 08:29 AM, Venky Shankar wrote:
On Wed, Sep 2, 2015 at 11:39 PM, Aravinda <avishwan@xxxxxxxxxx> wrote:
On 09/02/2015 11:13 PM, Shyam wrote:
On 09/02/2015 10:47 AM, Krutika Dhananjay wrote:

------------------------------------------------------------------------

     *From: *"Shyam" <srangana@xxxxxxxxxx>
     *To: *"Aravinda" <avishwan@xxxxxxxxxx>, "Gluster Devel"
     <gluster-devel@xxxxxxxxxxx>
     *Sent: *Wednesday, September 2, 2015 8:09:55 PM
     *Subject: *Re:  Gluster Sharding and Geo-replication

     On 09/02/2015 03:12 AM, Aravinda wrote:
      > Geo-replication and Sharding Team today discussed about the
approach
      > to make Sharding aware Geo-replication. Details are as below
      >
      > Participants: Aravinda, Kotresh, Krutika, Rahul Hinduja, Vijay
Bellur
      >
      > - Both Master and Slave Volumes should be Sharded Volumes with
same
      >    configurations.

     If I am not mistaken, geo-rep supports replicating to a non-gluster
     local FS at the slave end. Is this correct? If so, would this
     limitation
     not make that problematic?

     When you state *same configuration*, I assume you mean the sharding
     configuration, not the volume graph, right?

That is correct. The only requirement is for the slave to have shard
translator (for, someone needs to present aggregated view of the file to
the READers on the slave).
Also the shard-block-size needs to be kept same between master and
slave. Rest of the configuration (like the number of subvols of DHT/AFR)
can vary across master and slave.

Do we need to have the sharded block size the same? As I assume the file
carries an xattr that contains the size it is sharded with
(trusted.glusterfs.shard.block-size), so if this is synced across, it would
do. If this is true, what it would mean is that "a sharded volume needs a
shard supported slave to ge-rep to".
Yes. Number of bricks and replica count can be different. But sharded block
size should be same. Only the first file will have
xattr(trusted.glusterfs.shard.block-size), Geo-rep should sync this xattr
also to Slave. Only Gsyncd can read/write the sharded chunks. Sharded Slave
Volume is required to understand these chunks when read(non Gsyncd clients)
Even if this works I am very much is disagreement with this mechanism
of synchronization (not that I have a working solution in my head as
of now).
Supporting non sharded Volume should be easy. As discussed, let 
Changelog record everything including Sharded file changes.(May be add 
Flag as internal)
Sharding has to make sure that Xattr operation on main file whenever any 
of its chunks updated.

In Geo-rep based on config(say --use-slave-sharding) decide to sync 
chunks or full file.

-Krutika

      > - In Changelog record changes related to Sharded files also. Just
     like
      >    any regular files.
      > - Sharding should allow Geo-rep to list/read/write Sharding
internal
      >    Xattrs if Client PID is gsyncd(-1)
      > - Sharding should allow read/write of Sharded files(that is in
     .shards
      >    directory) if Client PID is GSYNCD
      > - Sharding should return actual file instead of returning the
      >    aggregated content when the Main file is requested(Client PID
      >    GSYNCD)
      >
      > For example, a file f1 is created with GFID G1.
      >
      > When the file grows it gets sharded into chunks(say 5 chunks).
      >
      >      f1   G1
      >      .shards/G1.1   G2
      >      .shards/G1.2   G3
      >      .shards/G1.3   G4
      >      .shards/G1.4   G5
      >
      > In Changelog, this is recorded as 5 different files as below
      >
      >      CREATE G1 f1
      >      DATA G1
      >      META G1
      >      CREATE G2 PGS/G1.1
      >      DATA G2
      >      META G1
      >      CREATE G3 PGS/G1.2
      >      DATA G3
      >      META G1
      >      CREATE G4 PGS/G1.3
      >      DATA G4
      >      META G1
      >      CREATE G5 PGS/G1.4
      >      DATA G5
      >      META G1
      >
      > Where PGS is GFID of .shards directory.
      >
      > Geo-rep will create these files independently in Slave Volume and
      > syncs Xattrs of G1. Data can be read only when all the chunks are
      > synced to Slave Volume. Data can be read partially if main/first
file
      > and some of the chunks synced to Slave.
      >
      > Please add if I missed anything. C & S Welcome.
      >
      > regards
      > Aravinda
      >
      > On 08/11/2015 04:36 PM, Aravinda wrote:
      >> Hi,
      >>
      >> We are thinking different approaches to add support in
     Geo-replication
      >> for Sharded Gluster Volumes[1]
      >>
      >> *Approach 1: Geo-rep: Sync Full file*
      >>    - In Changelog only record main file details in the same brick
      >> where it is created
      >>    - Record as DATA in Changelog whenever any addition/changes
     to the
      >> sharded file
      >>    - Geo-rep rsync will do checksum as a full file from mount and
      >> syncs as new file
      >>    - Slave side sharding is managed by Slave Volume
      >> *Approach 2: Geo-rep: Sync sharded file separately*
      >>    - Geo-rep rsync will do checksum for sharded files only
      >>    - Geo-rep syncs each sharded files independently as new files
      >>    - [UNKNOWN] Sync internal xattrs(file size and block count)
     in the
      >> main sharded file to Slave Volume to maintain the same state as
     in Master.
      >>    - Sharding translator to allow file creation under .shards
     dir for
      >> gsyncd. that is Parent GFID is .shards directory
      >>    - If sharded files are modified during Geo-rep run may end up
     stale
      >> data in Slave.
      >>    - Files on Slave Volume may not be readable unless all sharded
      >> files sync to Slave(Each bricks in Master independently sync
     files to
      >> slave)
      >>
      >> First approach looks more clean, but we have to analize the Rsync
      >> checksum performance on big files(Sharded in backend, accessed
     as one
      >> big file from rsync)
      >>
      >> Let us know your thoughts. Thanks
      >>
      >> Ref:
      >> [1]
      >>

http://www.gluster.org/community/documentation/index.php/Features/sharding-xlator
      >> --
      >> regards
      >> Aravinda
      >>
      >>
      >> _______________________________________________
      >> Gluster-devel mailing list
      >> Gluster-devel@xxxxxxxxxxx
      >> http://www.gluster.org/mailman/listinfo/gluster-devel
      >
      >
      >
      > _______________________________________________
      > Gluster-devel mailing list
      > Gluster-devel@xxxxxxxxxxx
      > http://www.gluster.org/mailman/listinfo/gluster-devel
      >
     _______________________________________________
     Gluster-devel mailing list
     Gluster-devel@xxxxxxxxxxx
     http://www.gluster.org/mailman/listinfo/gluster-devel

regards
Aravinda

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

regards
Aravinda

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel