----- Mail original ----- > On 2011-05-12, Cedric Lagneau <cedric.lagneau at openwide.fr> wrote: > > My initial problem on the testing platform is not solved: glusterd > > geo-replication command stop working after about one day. > > > > On Master: > > #cat > > ssh%3A%2F%2Froot%40slave.mydomain.com%3Afile%3A%2F%2F%2Fdata%2Ftest2.log > > [2011-05-12 10:50:53.451495] I [monitor(monitor):19:set_state] > > Monitor: new state: starting... > > [2011-05-12 10:50:53.465759] I [monitor(monitor):42:monitor] > > Monitor: > > ------------------------------------------------------------ > > [2011-05-12 10:50:53.466232] I [monitor(monitor):43:monitor] > > Monitor: starting gsyncd worker > > [2011-05-12 10:50:53.596132] I [gsyncd:287:main_i] <top>: syncing: > > gluster://localhost:test2 -> ssh://slave.mydomain.com:/data/test2 > > [2011-05-12 10:50:53.641566] D [repce:131:push] RepceClient: call > > 1879:140148091115264:1305190253.64 __repce_version__() ... > > [2011-05-12 10:50:53.751271] E [syncdutils:131:log_raise_exception] > > <top>: FAIL: > > Traceback (most recent call last): > > File > > "/usr/lib/glusterfs/glusterfs/python/syncdaemon/syncdutils.py", > > line 152, in twrap > > tf(*aa) > > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", > > line 118, in listen > > rid, exc, res = recv(self.inf) > > File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py", > > line 42, in recv > > return pickle.load(inf) > > EOFError > > [2011-05-12 10:50:53.759484] D [monitor(monitor):57:monitor] > > Monitor: worker got connected in 0 sec, waiting 59 more to make sure > > it's fine > > [2011-05-12 10:51:53.535005] I [monitor(monitor):19:set_state] > > Monitor: new state: faulty > > > > There is not test2-gluster.log. > > > > On Slave: > > no log (in debug mode) and no process /usr/bin/python > > /usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py > > > > > > tcpdump on SLAVE show some ssh traffic with Master server when i > > start geo-replication. > > > > glusterd strace on master with a starting geo-replication with > > status faulty: > > It would be more interesting to strace the execution of the remote > gsyncd. That can be accomplished by > smuggling in strace to the remote-gsyncd command: > > # gluster volume geo-replication test2 slave.mydomain.com::/data/test2 > config remote-gsyncd \ > "strace -f -s512 -o /tmp/gsyncd-test2.slog `gluster volume > geo-replication test2 slave.mydomain.com::/data/test2 config > remote-gsyncd`" > > From that we can read out why remote gsyncd invocation/initialization > fails. > > Csaba > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users Thanks for the tips of strace inside the glusterd conf. My problem should be related to ssh and the -i secret.pem parameter. (seems to be a bad permission on the file...) I've removed this params for test and its works (a geo-replication faulty volume -> stop, remove the -i secret.pem in ssh_command conf, start and it works) thanks again for your help -- C?dric Lagneau