Cannot get geo-replicate working with Gluster 3.7

Don Ky <asandybox@xxxxxxxxx> · Mon, 10 Aug 2015 17:38:02 -0400

Hello all,
I've been struggling to get gluster-geo replicate functionality working for the last couple of days. I keep getting the following errors:

2015-08-10 17:27:07.855817] E [resource(/gluster/volume1):222:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-Cnh7xL/ee1e6b6c8823302e93454e632bd81fbe.sock root@xxxxxxxxxxxxxxxxxxxxx /nonexistent/gsyncd --session-owner 50600483-7aa3-4fab-a66c-63350af607b0 -N --listen --timeout 120 gluster://localhost:volume1-replicate" returned with 127, saying:
[2015-08-10 17:27:07.856066] E [resource(/gluster/volume1):226:logerr] Popen: ssh> bash: /nonexistent/gsyncd: No such file or directory
[2015-08-10 17:27:07.856441] I [syncdutils(/gluster/volume1):220:finalize] <top>: exiting.
[2015-08-10 17:27:07.858120] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-08-10 17:27:07.858361] I [syncdutils(agent):220:finalize] <top>: exiting.
[2015-08-10 17:27:07.858211] I [monitor(monitor):274:monitor] Monitor: worker(/gluster/volume1) died before establishing connection
[2015-08-10 17:27:18.181344] I [monitor(monitor):221:monitor] Monitor: ------------------------------------------------------------
[2015-08-10 17:27:18.181842] I [monitor(monitor):222:monitor] Monitor: starting gsyncd worker
[2015-08-10 17:27:18.387790] I [gsyncd(/gluster/volume1):649:main_i] <top>: syncing: gluster://localhost:volume1 -> ssh://root@xxxxxxxxxxxxxxxxxxxxx:gluster://localhost:volume1-replicate
[2015-08-10 17:27:18.389427] D [gsyncd(agent):643:main_i] <top>: rpc_fd: '7,11,10,9'
[2015-08-10 17:27:18.390553] I [changelogagent(agent):75:__init__] ChangelogAgent: Agent listining...
[2015-08-10 17:27:18.418788] D [repce(/gluster/volume1):191:push] RepceClient: call 8460:140341431777088:1439242038.42 __repce_version__() ...
[2015-08-10 17:27:18.629983] E [syncdutils(/gluster/volume1):252:log_raise_exception] <top>: connection to peer is broken
[2015-08-10 17:27:18.630651] W [syncdutils(/gluster/volume1):256:log_raise_exception] <top>: !!!!!!!!!!!!!
[2015-08-10 17:27:18.630794] W [syncdutils(/gluster/volume1):257:log_raise_exception] <top>: !!! getting "No such file or directory" errors is most likely due to MISCONFIGURATION, please consult https://access.redhat.com/site/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/chap-User_Guide-Geo_Rep-Preparation-Settingup_Environment.html
[2015-08-10 17:27:18.630929] W [syncdutils(/gluster/volume1):265:log_raise_exception] <top>: !!!!!!!!!!!!!
[2015-08-10 17:27:18.631129] E [resource(/gluster/volume1):222:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-RPuEyN/ee1e6b6c8823302e93454e632bd81fbe.sock root@xxxxxxxxxxxxxxxxxxxxx /nonexistent/gsyncd --session-owner 50600483-7aa3-4fab-a66c-63350af607b0 -N --listen --timeout 120 gluster://localhost:volume1-replicate" returned with 127, saying:
[2015-08-10 17:27:18.631280] E [resource(/gluster/volume1):226:logerr] Popen: ssh> bash: /nonexistent/gsyncd: No such file or directory
[2015-08-10 17:27:18.631567] I [syncdutils(/gluster/volume1):220:finalize] <top>: exiting.
[2015-08-10 17:27:18.633125] I [repce(agent):92:service_loop] RepceServer: terminating on reaching EOF.
[2015-08-10 17:27:18.633183] I [monitor(monitor):274:monitor] Monitor: worker(/gluster/volume1) died before establishing connection
[2015-08-10 17:27:18.633392] I [syncdutils(agent):220:finalize] <top>: exiting.

and the status is continuously faulty:

[root@neptune volume1]# gluster volume geo-replication volume01 gluster02::volume01-replicate status

MASTER NODE    MASTER VOL    MASTER BRICK           SLAVE USER    SLAVE                              SLAVE NODE    STATUS    CRAWL STATUS    LAST_SYNCED
-------------------------------------------------------------------------------------------------------------------------------------------------------
neptune   volume01    /gluster/volume01   root          gluster02::volume01-replicate    N/A           Faulty    N/A             N/A

What I'm trying to accomplish is to mirror a volume from gluster01 (master) to gluster02 (slave). 

Here is a break down of the steps I took

yum -y install glusterfs-server glusterfs-geo-replication
service glusterd start

#gluster01
gluster volume create volume1 gluster01.example.com:/gluster/volume1
gluster volume start volume1

#gluster02
gluster volume create volume1-replicate gluster02.example.com:/gluster/volume1-replicate
gluster volume start volume1-replicate

#geo replicate
gluster system:: execute gsec_create

#gluster01
gluster volume geo-replication volume1 gluster02::volume1-replicate create push-pem
gluster volume geo-replication volume1 gluster02::volume1-replicate start
gluster volume geo-replication volume1 gluster02::volume1-replicate status

#mouting and testing
mkdir /mnt/gluster
mount -t glusterfs gluster01.example.com:/volume1 /mnt/gluster
mount -t glusterfs gluster02.example.com:/volume1-replicate /mnt/gluster

#troubleshooting
gluster volume geo-replication volume1 gluster02::volume1-replicate config log-level DEBUG
service glusterd restart

gluster volume geo-replication volume1 gluster02::volume1-replicate config

There was one step before running 

gluster volume geo-replication volume1 gluster02::volume1-replicate create push-pem

I copied the secret.pub to gluster02(the slave) and added it to .ssh/authorized_keys. I can ssh as root from gluster01 to gluster02 fine. 

I'm currently running:

glusterfs-3.7.3-1.el7.x86_64
glusterfs-cli-3.7.3-1.el7.x86_64
glusterfs-libs-3.7.3-1.el7.x86_64
glusterfs-client-xlators-3.7.3-1.el7.x86_64
glusterfs-fuse-3.7.3-1.el7.x86_64
glusterfs-server-3.7.3-1.el7.x86_64
glusterfs-api-3.7.3-1.el7.x86_64
glusterfs-geo-replication-3.7.3-1.el7.x86_64

on both slave and master servers. Both servers have ntp installed are in sync and patched. 

I can mount volume1 or volume1-replicate on each host and confirmed that iptables have been flushed. 

Not sure exactly what else to check at this point. There appeared to be another user with similar errors but the mailing list says he resolved it on his own. 

Any ideas? I'm completely lost on what could be issue. Some of the redhat docs mentioned it could be fuse but it looks like fuse is installed as part of gluster. 

Thanks 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users