unable to get Geo-replication working

gregswift at gmail.com (Greg Swift) · Mon, 30 Apr 2012 09:41:26 -0500

I spent a lot of time troubleshooting this setup.  The resolution for
me was making sure the glusterfs-geo-replication software was
installed on the target system.

http://docs.redhat.com/docs/en-US/Red_Hat_Storage/2/html/User_Guide/chap-User_Guide-Geo_Rep-Preparation-Minimum_Reqs.html
States: Before deploying Geo-replication, you must ensure that both
Master and Slave are Red Hat Storage instances.

I realize that in a strictly literal sense this tells you that you
need the geo-replication software on the slave, however it would make
more sense to clearly state it. A geo-replication target not running
glusterfs just needs glusterfs-{core,geo-replication} not a full RH
Storage instance.

-greg

On Fri, Apr 27, 2012 at 12:26, Scot Kreienkamp <SKreien at la-z-boy.com> wrote:
> I am trying to setup geo-replication between a gluster volume and a
> non-gluster volume, yes.? The command I used to start geo-replication is:
>
>
>
> gluster volume geo-replication RMSNFSMOUNT hptv3130:/nfs start
>
>
>
>
>
> Scot Kreienkamp
>
> Senior Systems Engineer
>
> skreien at la-z-boy.com
>
>
>
> From: Mohit Anchlia [mailto:mohitanchlia at gmail.com]
> Sent: Friday, April 27, 2012 12:46 PM
> To: Scot Kreienkamp
> Cc: gluster-users at gluster.org
> Subject: Re: unable to get Geo-replication working
>
>
>
> Are you trying to setup geo-replication between gluster voolume?->
> non-gluster volume? or is it between gluster volume?-> gluster volume?
>
>
>
> It looks like there might be some configuration issue here. Please give your
> script of how you configured geo-replication?
>
>
>
>
> On Fri, Apr 27, 2012 at 8:18 AM, Scot Kreienkamp <SKreien at la-z-boy.com>
> wrote:
>
> Sure?.
>
>
>
> [root at retv3130 RMSNFSMOUNT]# gluster peer status
>
> Number of Peers: 1
>
>
>
> Hostname: retv3131
>
> Uuid: 450cc731-60be-47be-a42d-d856a03dac01
>
> State: Peer in Cluster (Connected)
>
>
>
>
>
> [root at hptv3130 ~]# gluster peer status
>
> No peers present
>
>
>
>
>
> [root at retv3130 ~]# gluster volume geo-replication RMSNFSMOUNT
> root at hptv3130:/nfs status
>
>
>
> MASTER?????????????? SLAVE
> STATUS
>
> --------------------------------------------------------------------------------
>
> RMSNFSMOUNT????????? root at hptv3130:/nfs
> faulty
>
>
>
>
>
>
>
>
>
> Scot Kreienkamp
>
> Senior Systems Engineer
>
> skreien at la-z-boy.com
>
>
>
> From: Mohit Anchlia [mailto:mohitanchlia at gmail.com]
> Sent: Friday, April 27, 2012 10:58 AM
> To: Scot Kreienkamp
> Subject: Re: unable to get Geo-replication working
>
>
>
> Can you look at the status of "gluster geo-replication MASTER SLAVE status"?
> Also, do gluster peer status on both MASTER and SLAVE? Paste the results
> here.
>
> On Fri, Apr 27, 2012 at 6:53 AM, Scot Kreienkamp <SKreien at la-z-boy.com>
> wrote:
>
> Hey everyone,
>
>
>
> I'm trying to get geo-replication working from a two brick replicated volume
> to a single directory on a remote host.? I can ssh as either root or
> georep-user to the destination as either georep-user or root with no
> password using the default ssh commands given by the config command: ssh
> -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> /etc/glusterd/geo-replication/secret.pem.? All the glusterfs rpms are
> installed on the remote host.? There are no firewalls running on any of the
> hosts and no firewalls in between them.? The remote_gsyncd command is
> correct as I can copy and paste it to the command line and run it on both
> source hosts and destination host.? I'm using the current production version
> of glusterfs 3.2.6, rsync 3.0.9, fuse-2.8.3 rpm's are installed, OpenSSH
> 5.3, and Python 2.6.6 on RHEL6.2.? The remote directory is set to 777, world
> read-write so there are no permission errors.
>
>
>
> I'm using this command to start replication: gluster volume geo-replication
> RMSNFSMOUNT hptv3130:/nfs start
>
>
>
> Whenever I try to initiate geo-replication the status goes to starting for
> about 30 seconds, then goes to faulty.? On the slave I get these messages
> repeating in the geo-replication-slaves log:
>
>
>
> [2012-04-27 09:37:59.485424] I [resource(slave):201:service_loop] FILE:
> slave listening
>
> [2012-04-27 09:38:05.413768] I [repce(slave):60:service_loop] RepceServer:
> terminating on reaching EOF.
>
> [2012-04-27 09:38:15.35907] I [resource(slave):207:service_loop] FILE:
> connection inactive for 120 seconds, stopping
>
> [2012-04-27 09:38:15.36382] I [gsyncd(slave):302:main_i] <top>: exiting.
>
> [2012-04-27 09:38:19.952683] I [gsyncd(slave):290:main_i] <top>: syncing:
> file:///nfs
>
> [2012-04-27 09:38:19.955024] I [resource(slave):201:service_loop] FILE:
> slave listening
>
>
>
>
>
> I get these messages in etc-glusterfs-glusterd.vol.log on the slave:
>
>
>
> [2012-04-27 09:39:23.667930] W [socket.c:1494:__socket_proto_state_machine]
> 0-socket.management: reading from socket failed. Error (Transport endpoint
> is not connected), peer (127.0.0.1:1021)
>
> [2012-04-27 09:39:43.736138] I
> [glusterd-handler.c:3226:glusterd_handle_getwd] 0-glusterd: Received getwd
> req
>
> [2012-04-27 09:39:43.740749] W [socket.c:1494:__socket_proto_state_machine]
> 0-socket.management: reading from socket failed. Error (Transport endpoint
> is not connected), peer (127.0.0.1:1023)
>
>
>
> As I understand it from searching the list that message is benign and can be
> ignored though.
>
>
>
>
>
> Here are tails of all the logs on one of the sources:
>
>
>
> [root at retv3130 RMSNFSMOUNT]# tail
> ssh%3A%2F%2Fgeorep-user%4010.2.1.60%3Afile%3A%2F%2F%2Fnfs.gluster.log
>
> +------------------------------------------------------------------------------+
>
> [2012-04-26 16:16:40.804047] E [socket.c:1685:socket_connect_finish]
> 0-RMSNFSMOUNT-client-1: connection to? failed (Connection refused)
>
> [2012-04-26 16:16:40.804852] I [rpc-clnt.c:1536:rpc_clnt_reconfig]
> 0-RMSNFSMOUNT-client-0: changing port to 24009 (from 0)
>
> [2012-04-26 16:16:44.779451] I [rpc-clnt.c:1536:rpc_clnt_reconfig]
> 0-RMSNFSMOUNT-client-1: changing port to 24010 (from 0)
>
> [2012-04-26 16:16:44.855903] I
> [client-handshake.c:1090:select_server_supported_programs]
> 0-RMSNFSMOUNT-client-0: Using Program GlusterFS 3.2.6, Num (1298437),
> Version (310)
>
> [2012-04-26 16:16:44.856893] I [client-handshake.c:913:client_setvolume_cbk]
> 0-RMSNFSMOUNT-client-0: Connected to 10.170.1.222:24009, attached to remote
> volume '/nfs'.
>
> [2012-04-26 16:16:44.856943] I [afr-common.c:3141:afr_notify]
> 0-RMSNFSMOUNT-replicate-0: Subvolume 'RMSNFSMOUNT-client-0' came back up;
> going online.
>
> [2012-04-26 16:16:44.866734] I [fuse-bridge.c:3339:fuse_graph_setup] 0-fuse:
> switched to graph 0
>
> [2012-04-26 16:16:44.867391] I [fuse-bridge.c:3241:fuse_thread_proc] 0-fuse:
> unmounting /tmp/gsyncd-aux-mount-8zMs0J
>
> [2012-04-26 16:16:44.868538] W [glusterfsd.c:727:cleanup_and_exit]
> (-->/lib64/libc.so.6(clone+0x6d) [0x31494e5ccd] (-->/lib64/libpthread.so.0()
> [0x3149c077f1]
> (-->/opt/glusterfs/3.2.6/sbin/glusterfs(glusterfs_sigwaiter+0x17c)
> [0x40477c]))) 0-: received signum (15), shutting down
>
> [root at retv3130 RMSNFSMOUNT]# tail
> ssh%3A%2F%2Fgeorep-user%4010.2.1.60%3Afile%3A%2F%2F%2Fnfs.log
>
> [2012-04-26 16:16:39.263871] I [gsyncd:290:main_i] <top>: syncing:
> gluster://localhost:RMSNFSMOUNT -> ssh://georep-user at hptv3130:/nfs
>
> [2012-04-26 16:16:41.332690] E [syncdutils:133:log_raise_exception] <top>:
> FAIL:
>
> Traceback (most recent call last):
>
> ? File
> "/opt/glusterfs/3.2.6/local/libexec/glusterfs/python/syncdaemon/syncdutils.py",
> line 154, in twrap
>
> ??? tf(*aa)
>
> ? File
> "/opt/glusterfs/3.2.6/local/libexec/glusterfs/python/syncdaemon/repce.py",
> line 117, in listen
>
> ??? rid, exc, res = recv(self.inf)
>
> ? File
> "/opt/glusterfs/3.2.6/local/libexec/glusterfs/python/syncdaemon/repce.py",
> line 41, in recv
>
> ??? return pickle.load(inf)
>
> EOFError
>
> [root at retv3130 RMSNFSMOUNT]# tail
> ssh%3A%2F%2Froot%4010.2.1.60%3Afile%3A%2F%2F%2Fnfs.gluster.log
>
> [2012-04-27 09:48:42.892842] I [rpc-clnt.c:1536:rpc_clnt_reconfig]
> 0-RMSNFSMOUNT-client-1: changing port to 24010 (from 0)
>
> [2012-04-27 09:48:43.120749] I
> [client-handshake.c:1090:select_server_supported_programs]
> 0-RMSNFSMOUNT-client-0: Using Program GlusterFS 3.2.6, Num (1298437),
> Version (310)
>
> [2012-04-27 09:48:43.121489] I [client-handshake.c:913:client_setvolume_cbk]
> 0-RMSNFSMOUNT-client-0: Connected to 10.170.1.222:24009, attached to remote
> volume '/nfs'.
>
> [2012-04-27 09:48:43.121515] I [afr-common.c:3141:afr_notify]
> 0-RMSNFSMOUNT-replicate-0: Subvolume 'RMSNFSMOUNT-client-0' came back up;
> going online.
>
> [2012-04-27 09:48:43.132904] I [fuse-bridge.c:3339:fuse_graph_setup] 0-fuse:
> switched to graph 0
>
> [2012-04-27 09:48:43.133704] I [fuse-bridge.c:2927:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel
> 7.13
>
> [2012-04-27 09:48:43.135797] I
> [afr-common.c:1520:afr_set_root_inode_on_first_lookup]
> 0-RMSNFSMOUNT-replicate-0: added root inode
>
> [2012-04-27 09:48:44.533289] W [fuse-bridge.c:2517:fuse_xattr_cbk]
> 0-glusterfs-fuse: 8:
> GETXATTR(trusted.glusterfs.9de3c1c8-a753-45a1-8042-b6a4872c5c3c.xtime) / =>
> -1 (Transport endpoint is not connected)
>
> [2012-04-27 09:48:44.544934] I [fuse-bridge.c:3241:fuse_thread_proc] 0-fuse:
> unmounting /tmp/gsyncd-aux-mount-uXCybC
>
> [2012-04-27 09:48:44.545879] W [glusterfsd.c:727:cleanup_and_exit]
> (-->/lib64/libc.so.6(clone+0x6d) [0x31494e5ccd] (-->/lib64/libpthread.so.0()
> [0x3149c077f1]
> (-->/opt/glusterfs/3.2.6/sbin/glusterfs(glusterfs_sigwaiter+0x17c)
> [0x40477c]))) 0-: received signum (15), shutting down
>
> [root at retv3130 RMSNFSMOUNT]# tail
> ssh%3A%2F%2Froot%4010.2.1.60%3Afile%3A%2F%2F%2Fnfs.log
>
> ? File
> "/opt/glusterfs/3.2.6/local/libexec/glusterfs/python/syncdaemon/libcxattr.py",
> line 34, in lgetxattr
>
> ??? return cls._query_xattr( path, siz, 'lgetxattr', attr)
>
> ? File
> "/opt/glusterfs/3.2.6/local/libexec/glusterfs/python/syncdaemon/libcxattr.py",
> line 26, in _query_xattr
>
> ??? cls.raise_oserr()
>
> ? File
> "/opt/glusterfs/3.2.6/local/libexec/glusterfs/python/syncdaemon/libcxattr.py",
> line 16, in raise_oserr
>
> ??? raise OSError(errn, os.strerror(errn))
>
> OSError: [Errno 107] Transport endpoint is not connected
>
> [2012-04-27 09:49:14.846837] I [monitor(monitor):59:monitor] Monitor:
> ------------------------------------------------------------
>
> [2012-04-27 09:49:14.847898] I [monitor(monitor):60:monitor] Monitor:
> starting gsyncd worker
>
> [2012-04-27 09:49:14.930681] I [gsyncd:290:main_i] <top>: syncing:
> gluster://localhost:RMSNFSMOUNT -> ssh://hptv3130:/nfs
>
>
>
>
>
> I'm out of ideas.? I've satisfied all the requirements I can find, and I'm
> not seeing anything in the logs that makes any sense to me as an error that
> I can fix.? Can anyone help?
>
>
>
> Thanks!
>
>
>
> Scot Kreienkamp
>
> skreien at la-z-boy.com
>
>
>
>
>
>
> This message is intended only for the individual or entity to which it is
> addressed. It may contain privileged, confidential information which is
> exempt from disclosure under applicable laws. If you are not the intended
> recipient, please note that you are strictly prohibited from disseminating
> or distributing this information (other than to the intended recipient) or
> copying this information. If you have received this communication in error,
> please notify us immediately by e-mail or by telephone at the above number.
> Thank you.
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
>
>
>
>
> This message is intended only for the individual or entity to which it is
> addressed. It may contain privileged, confidential information which is
> exempt from disclosure under applicable laws. If you are not the intended
> recipient, please note that you are strictly prohibited from disseminating
> or distributing this information (other than to the intended recipient) or
> copying this information. If you have received this communication in error,
> please notify us immediately by e-mail or by telephone at the above number.
> Thank you.
>
>
>
>
>
>
> This message is intended only for the individual or entity to which it is
> addressed. It may contain privileged, confidential information which is
> exempt from disclosure under applicable laws. If you are not the intended
> recipient, please note that you are strictly prohibited from disseminating
> or distributing this information (other than to the intended recipient) or
> copying this information. If you have received this communication in error,
> please notify us immediately by e-mail or by telephone at the above number.
> Thank you.
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>