On 05/17/11 13:04, anthony garnier wrote: > Hi, > I've put the Client log in Debug mod : > # gluster volume geo-replication /soft/venus config log-level DEBUG > geo-replication config updated successfully > > # gluster volume geo-replication /soft/venus config log-file > /usr/local/var/log/glusterfs/geo-replication-slaves/${session_owner}:file%3A%2F%2F%2Fsoft%2Fvenus.log > > # gluster volume geo-replication athena /soft/venus config session-owner > 28cbd261-3a3e-4a5a-b300-ea468483c944 > > # gluster volume geo-replication athena /soft/venus start > Starting geo-replication session between athena & /soft/venus has been > successful > > # gluster volume geo-replication athena /soft/venus status > MASTER SLAVE STATUS > -------------------------------------------------------------------------------- > athena /soft/venus starting... > > and then : > > # gluster volume geo-replication athena /soft/venus status > MASTER SLAVE STATUS > -------------------------------------------------------------------------------- > athena /soft/venus faulty Is this an edited output? By all chance, I'd expect to see the full slave url, ie. file:///soft/venus in the status output. > For client : > cat > /usr/local/var/log/glusterfs/geo-replication-slaves/28cbd261-3a3e-4a5a-b300-ea468483c944:file%3A%2F%2F%2Fsoft%2Fvenus.log > > > [2011-05-17 09:20:40.519731] I [gsyncd(slave):287:main_i] <top>: > syncing: file:///soft/venus > [2011-05-17 09:20:40.520587] I [resource(slave):200:service_loop] FILE: > slave listening > [2011-05-17 09:20:40.532951] I [repce(slave):61:service_loop] > RepceServer: terminating on reaching EOF. > [2011-05-17 09:21:50.528803] I [gsyncd(slave):287:main_i] <top>: > syncing: file:///soft/venus > [2011-05-17 09:21:50.529666] I [resource(slave):200:service_loop] FILE: > slave listening > [2011-05-17 09:21:50.542349] I [repce(slave):61:service_loop] > RepceServer: terminating on reaching EOF. > > > > For server : > # cat > /usr/local/var/log/glusterfs/geo-replication/athena/file%3A%2F%2F%2Fsoft%2Fvenus.log > > [2011-05-17 09:30:04.431369] I [monitor(monitor):42:monitor] Monitor: > ------------------------------------------------------------ > [2011-05-17 09:30:04.431669] I [monitor(monitor):43:monitor] Monitor: > starting gsyncd worker > [2011-05-17 09:30:04.486852] I [gsyncd:287:main_i] <top>: syncing: > gluster://localhost:athena -> file:///soft/venus [...] > raise RuntimeError("command failed: " + " ".join(argv)) > RuntimeError: command failed: /usr/local/sbin/glusterfs --xlator-option > *-dht.assert-no-child-down=true -l > /usr/local/var/log/glusterfs/geo-replication/athena/file%3A%2F%2F%2Fsoft%2Fvenus.gluster.log > -s localhost --volfile-id athena --client-pid=-1 > /tmp/gsyncd-aux-mount-TEqjwY > [2011-05-17 09:30:04.647973] D [monitor(monitor):57:monitor] Monitor: > worker got connected in 0 sec, waiting 59 more to make sure it's fine This is interesting in the sense that the error you get now is not the same as in your first post. Better said, the _symptoms_ are different, the error as such might be the same. I can imagine that there is a race in between exceptional events and it's accidental which one interrupts the event flow. So, it seems that the auxiliary glusterfs instance used by master gsyncd fails. (Sidenote: if you prefer to use client/server terminology instead of master/slave, that's fine, but master should be called client and slave should be called server, ie. the reverse way you do :) ) To see what's wrong with that, I again ask for the respective logs: ## setting DEBUG loglevel for master's aux glusterfs # gluster volume geo-replication athena /soft/venus config \ gluster-log-level DEBUG ## getting the path of the logfile of aux glusterfs # gluster volume geo-replication athena /soft/venus config \ gluster-log-file So pls post the latter thingy. Csaba