geo-replication/gsyncd questions

brs at usf.edu (Brian Smith) · Wed, 28 Sep 2011 12:50:13 -0400

I'm looking to figure out exactly how gluster's geo-rep works.  I have a 
general idea, but I still have some questions.

How, exactly, does gsyncd's crawl work to determine files to update?  I 
have a FS w/ 50 million+ inodes and I'm just wondering how that crawl 
will scale.  I assume that when an inode is modified, some xattr is set 
on each parent path to the root.  gsyncd reads this xattr and is able to 
efficiently crawl the tree to find updates?  Am I completely wrong?

My two sites will be connected via a dedicated leased line on a 
non-routable address space, so I'm not concerned about using SSH at the 
moment.  I see that gsyncd recognizes gluster vol definitions for the 
master; server:vol.

Does it also recognize gluster vol definitions for the slave system, i.e.

gluster volume geo-replication glusterfs://master:vol 
glusterfs://slave:vol ...

or does it need a directory path for the slave,

... glusterfs://master:vol file:///mnt/slave_vol
... glusterfs://master:vol ssh://slave:vol
...

I assume that the latter case uses ssh to fire up a gsyncd on the slave 
and listen over ssh.

Is there a doc somewhere with more details on this?  The docs on the 
gluster site leave a lot of questions.

Thanks,
-Brian

Brian Smith
Senior Systems Administrator
IT Research Computing, University of South Florida
4202 E. Fowler Ave. ENB308
Office Phone: +1 813 974-1467
Organization URL: http://rc.usf.edu