On 07/07/2011 15:25, Kaushik BV wrote: > Hi Chaica, > > This primarily means that the RPC communtication between the master > gsyncd module and slave gsyncd module is broken, this could happen to > various reasons. Check if it satisies all the pre-requisites: > > - If FUSE is installed in the machine, since Geo-replication module > mounts the GlusterFS volume using FUSE to sync data. > - If the Slave is a volume, check if the volume is started. > - If the Slave is a plain directory, check if the directory has been > created already with the desired permissions (Not applicable in your case) > - If Glusterfs 3.2 is not installed in the default location (in Master) > and has been prefixed to be installed in a custom location, configure > the *gluster-command* for it to point to exact location. > - If Glusterfs 3.2 is not installed in the default location (in slave) > and has been prefixed to be installed in a custom location, configure > the *remote-gsyncd-command* for it to point to exact place where gsyncd > is located. > - locate the slave log and see if it has any anomalies. > - Passwordless SSH is set up properly between the host and the remote > machine ( Not applicable in your case) Ok the situation has slightly evolved. Now I do have a slave log and clearer error message on the master : [2011-07-07 19:53:16.258866] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------ [2011-07-07 19:53:16.259073] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker [2011-07-07 19:53:16.332720] I [gsyncd:286:main_i] <top>: syncing: gluster://localhost:test-volume -> ssh://192.168.1.32::test-volume [2011-07-07 19:53:16.343554] D [repce:131:push] RepceClient: call 6302:140305661662976:1310061196.34 __repce_version__() ... [2011-07-07 19:53:20.931523] D [repce:141:__call__] RepceClient: call 6302:140305661662976:1310061196.34 __repce_version__ -> 1.0 [2011-07-07 19:53:20.932172] D [repce:131:push] RepceClient: call 6302:140305661662976:1310061200.93 version() ... [2011-07-07 19:53:20.933662] D [repce:141:__call__] RepceClient: call 6302:140305661662976:1310061200.93 version -> 1.0 [2011-07-07 19:53:20.933861] D [repce:131:push] RepceClient: call 6302:140305661662976:1310061200.93 pid() ... [2011-07-07 19:53:20.934525] D [repce:141:__call__] RepceClient: call 6302:140305661662976:1310061200.93 pid -> 10075 [2011-07-07 19:53:20.957355] E [syncdutils:131:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py", line 102, in main main_i() File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py", line 293, in main_i local.connect() File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", line 379, in connect raise RuntimeError("command failed: " + " ".join(argv)) RuntimeError: command failed: /usr/sbin/glusterfs --xlator-option *-dht.assert-no-child-down=true -L DEBUG -l /var/log/glusterfs/geo-replication/test-volume/ssh%3A%2F%2Froot%40192.168.1.32%3Agluster%3A%2F%2F127.0.0.1%3Atest-volume.gluster.log -s localhost --volfile-id test-volume --client-pid=-1 /tmp/gsyncd-aux-mount-hy6T_w [2011-07-07 19:53:20.960621] D [monitor(monitor):58:monitor] Monitor: worker seems to be connected (?? racy check) [2011-07-07 19:53:21.962501] D [monitor(monitor):62:monitor] Monitor: worker died in startup phase The command launched by glusterfs returns a 255 error shell code, which I belive means the command is terminated by a signal. On the slave log I have : [2011-07-07 19:54:49.571549] I [fuse-bridge.c:3218:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-z2Q2Hg [2011-07-07 19:54:49.572459] W [glusterfsd.c:712:cleanup_and_exit] (-->/lib/libc.so.6(clone+0x6d) [0x7f2c8998b02d] (-->/lib/libpthread.so.0(+0x68ba) [0x7f2c89c238ba] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xc5) [0x7f2c8a8f51b5]))) 0-: received signum (15), shutting down [2011-07-07 19:54:51.280207] W [write-behind.c:3029:init] 0-test-volume-write-behind: disabling write-behind for first 0 bytes [2011-07-07 19:54:51.291669] I [client.c:1935:notify] 0-test-volume-client-0: parent translators are ready, attempting connect on transport [2011-07-07 19:54:51.292329] I [client.c:1935:notify] 0-test-volume-client-1: parent translators are ready, attempting connect on transport [2011-07-07 19:55:38.582926] I [rpc-clnt.c:1531:rpc_clnt_reconfig] 0-test-volume-client-0: changing port to 24009 (from 0) [2011-07-07 19:55:38.583456] I [rpc-clnt.c:1531:rpc_clnt_reconfig] 0-test-volume-client-1: changing port to 24009 (from 0) Bye, Carl Chenet