On 2011-05-12, Cedric Lagneau <cedric.lagneau at openwide.fr> wrote: > My initial problem on the testing platform is not solved: glusterd geo-replication command stop working after about one day. > > On Master: > #cat ssh%3A%2F%2Froot%40slave.mydomain.com%3Afile%3A%2F%2F%2Fdata%2Ftest2.log > [2011-05-12 10:50:53.451495] I [monitor(monitor):19:set_state] Monitor: new state: starting... > [2011-05-12 10:50:53.465759] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------ > [2011-05-12 10:50:53.466232] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker > [2011-05-12 10:50:53.596132] I [gsyncd:287:main_i] <top>: syncing: gluster://localhost:test2 -> ssh://slave.mydomain.com:/data/test2 > [2011-05-12 10:50:53.641566] D [repce:131:push] RepceClient: call 1879:140148091115264:1305190253.64 __repce_version__() ... > [2011-05-12 10:50:53.751271] E [syncdutils:131:log_raise_exception] <top>: FAIL: > Traceback (most recent call last): > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twrap > tf(*aa) > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", line 118, in listen > rid, exc, res = recv(self.inf) > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", line 42, in recv > return pickle.load(inf) > EOFError > [2011-05-12 10:50:53.759484] D [monitor(monitor):57:monitor] Monitor: worker got connected in 0 sec, waiting 59 more to make sure it's fine > [2011-05-12 10:51:53.535005] I [monitor(monitor):19:set_state] Monitor: new state: faulty > > There is not test2-gluster.log. > > On Slave: > no log (in debug mode) and no process /usr/bin/python /usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py > > > tcpdump on SLAVE show some ssh traffic with Master server when i start geo-replication. > > glusterd strace on master with a starting geo-replication with status faulty: It would be more interesting to strace the execution of the remote gsyncd. That can be accomplished by smuggling in strace to the remote-gsyncd command: # gluster volume geo-replication test2 slave.mydomain.com::/data/test2 config remote-gsyncd \ "strace -f -s512 -o /tmp/gsyncd-test2.slog `gluster volume geo-replication test2 slave.mydomain.com::/data/test2 config remote-gsyncd`"