Issue with geo-replication and nfs auth

cedric.lagneau at openwide.fr (Cedric Lagneau) · Sat, 14 May 2011 08:20:26 +0200 (CEST)

----- Mail original -----
> On 2011-05-12, Cedric Lagneau <cedric.lagneau at openwide.fr> wrote:
> > My initial problem on the testing platform is not solved: glusterd
> > geo-replication command stop working after about one day.
> >
> > On Master:
> > #cat
> > ssh%3A%2F%2Froot%40slave.mydomain.com%3Afile%3A%2F%2F%2Fdata%2Ftest2.log
> > [2011-05-12 10:50:53.451495] I [monitor(monitor):19:set_state]
> > Monitor: new state: starting...
> > [2011-05-12 10:50:53.465759] I [monitor(monitor):42:monitor]
> > Monitor:
> > ------------------------------------------------------------
> > [2011-05-12 10:50:53.466232] I [monitor(monitor):43:monitor]
> > Monitor: starting gsyncd worker
> > [2011-05-12 10:50:53.596132] I [gsyncd:287:main_i] <top>: syncing:
> > gluster://localhost:test2 -> ssh://slave.mydomain.com:/data/test2
> > [2011-05-12 10:50:53.641566] D [repce:131:push] RepceClient: call
> > 1879:140148091115264:1305190253.64 __repce_version__() ...
> > [2011-05-12 10:50:53.751271] E [syncdutils:131:log_raise_exception]
> > <top>: FAIL:
> > Traceback (most recent call last):
> >   File
> >   "/usr/lib/glusterfs/glusterfs/python/syncdaemon/syncdutils.py",
> >   line 152, in twrap
> >     tf(*aa)
> >   File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py",
> >   line 118, in listen
> >     rid, exc, res = recv(self.inf)
> >   File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/repce.py",
> >   line 42, in recv
> >     return pickle.load(inf)
> > EOFError
> > [2011-05-12 10:50:53.759484] D [monitor(monitor):57:monitor]
> > Monitor: worker got connected in 0 sec, waiting 59 more to make sure
> > it's fine
> > [2011-05-12 10:51:53.535005] I [monitor(monitor):19:set_state]
> > Monitor: new state: faulty
> >
> > There is not test2-gluster.log.
> >
> > On Slave:
> > no log (in debug mode) and no process /usr/bin/python
> > /usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py
> >
> >
> > tcpdump on SLAVE show some ssh traffic with Master server when i
> > start geo-replication.
> >
> > glusterd strace on master with a starting geo-replication with
> > status faulty:
> 
> It would be more interesting to strace the execution of the remote
> gsyncd. That can be accomplished by
> smuggling in strace to the remote-gsyncd command:
> 
> # gluster volume geo-replication test2 slave.mydomain.com::/data/test2
> config remote-gsyncd \
> "strace -f -s512 -o /tmp/gsyncd-test2.slog `gluster volume
> geo-replication test2 slave.mydomain.com::/data/test2 config
> remote-gsyncd`"
> 
> From that we can read out why remote gsyncd invocation/initialization
> fails.
> 
> Csaba
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Thanks for the tips of strace inside the glusterd conf.

My problem should be related to ssh and the -i secret.pem parameter. (seems to be a bad permission on the file...)

I've removed this params for test and its works (a geo-replication faulty volume -> stop, remove the -i secret.pem in ssh_command conf, start and it works)

thanks again for your help

-- 

C?dric Lagneau