Comments inline Thanks and Regards, Kotresh H R ----- Original Message ----- > From: "vyyy杨雨阳" <yuyangyang@xxxxxxxxx> > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > Cc: "Saravanakumar Arumugam" <sarumuga@xxxxxxxxxx>, Gluster-users@xxxxxxxxxxx, "Aravinda Vishwanathapura Krishna > Murthy" <avishwan@xxxxxxxxxx> > Sent: Thursday, May 26, 2016 7:04:30 AM > Subject: 答复: : geo-replication status partial faulty > > I retried many times, find that when I set slave volume's bricks or nodes > below 6, the geo-replication volume status is OK. > I am not sure if this is a bug. This should not be so. It should not depend on the number of bricks on slave volume side. I will try to reproduce and check. > > > Whether Normal or faulty nodes, test result is the same. > > [root@SVR8049HW2285 ~]# bash -x /usr/libexec/glusterfs/gverify.sh filews > root glusterfs02.sh3.ctripcorp.com filews_slave "/tmp/gverify.log" > + BUFFER_SIZE=104857600 > ++ gluster --print-logdir > + slave_log_file=/var/log/glusterfs/geo-replication-slaves/slave.log > + main filews root glusterfs02.sh3.ctripcorp.com filews_slave > /tmp/gverify.log > + log_file=/tmp/gverify.log > + SSH_PORT=22 > + ping_host glusterfs02.sh3.ctripcorp.com 22 > + '[' 0 -ne 0 ']' > + ssh -oNumberOfPasswordPrompts=0 root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 'echo > Testing_Passwordless_SSH' > Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). > + '[' 255 -ne 0 ']' > + echo 'FORCE_BLOCKER|Passwordless ssh login has not been setup with > glusterfs02.sh3.ctripcorp.com for user root.' > + exit 1 > [root@SVR8049HW2285 ~]# > This means you need to setup 'password less ssh' to 'glusterfs02.sh3.ctripcorp.com' from any one master node and follow geo-rep creation steps as said in previous mail. With below 6 nodes/brick at slave volume, the geo-rep session is running fine for you? > > > Best Regards > 杨雨阳 Yuyang Yang > OPS > Ctrip Infrastructure Service (CIS) > Ctrip Computer Technology (Shanghai) Co., Ltd > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389 > Web: www.Ctrip.com > > > -----邮件原件----- > 发件人: Kotresh Hiremath Ravishankar [mailto:khiremat@xxxxxxxxxx] > 发送时间: Wednesday, May 25, 2016 4:58 PM > 收件人: vyyy杨雨阳 <yuyangyang@xxxxxxxxx> > 抄送: Saravanakumar Arumugam <sarumuga@xxxxxxxxxx>; Gluster-users@xxxxxxxxxxx; > Aravinda Vishwanathapura Krishna Murthy <avishwan@xxxxxxxxxx> > 主题: Re: : geo-replication status partial faulty > > Answers inline > > Thanks and Regards, > Kotresh H R > > ----- Original Message ----- > > From: "vyyy杨雨阳" <yuyangyang@xxxxxxxxx> > > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > > Cc: "Saravanakumar Arumugam" <sarumuga@xxxxxxxxxx>, > > Gluster-users@xxxxxxxxxxx, "Aravinda Vishwanathapura Krishna Murthy" > > <avishwan@xxxxxxxxxx> > > Sent: Wednesday, May 25, 2016 12:34:12 PM > > Subject: : geo-replication status partial faulty > > > > Hi, > > > > > > Verify below before proceeding further. > > > > 1. There is only one session directory in all master nodes. > > > > ls -l /var/lib/glusterd/geo-replication/ > > > > 2. I can find "*.status" file in those nodes that geo-replication > > status shows active or passive, but there is no "*.status" file when > > node status is faulty > > > > Per your instruction to clean up ssh keys and do a fresh setup, step > > 3 failed > > > > 3. Create georep ssh keys again and do create force. > > gluster system:: exec gsec_create > > gluster vol geo-rep <master-vol> <slave-host>::<slave-vol1> create > > push-pem force > > > > [root@SVR8048HW2285 glusterfs]# gluster volume geo-replication filews > > glusterfs02.sh3.ctripcorp.com::filews_slave create push-pem force > > Unable to fetch slave volume details. Please check the slave cluster > > and slave volume. > > geo-replication command failed > > Then please check the slave cluster status whether it is running fine > and glusterd is running on all slave nodes. After fixing slave cluster if > any issues. > Please check whether the below script runs fine. > > bash -x /usr/libexec/glusterfs/gverify.sh <master_vol_name> root > glusterfs02.sh3.ctripcorp.com::filews_slave <slave_vol> > "/tmp/gverify.log" > > > > [root@SVR8048HW2285 glusterfs]# > > [root@SVR8048HW2285 glusterfs]# ssh -i > > /var/lib/glusterd/geo-replication/secret.pem > > root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > Last login: Wed May 25 14:33:15 2016 from 10.8.231.11 This is a > > private network server, in monitoring state. > > It is strictly prohibited to unauthorized access and used. > > [root@SVR6520HW2285 ~]# > > > > etc-glusterfs-glusterd.vol.log loged following message > > > > [2016-05-25 06:47:47.698364] E > > [glusterd-geo-rep.c:2012:glusterd_verify_slave] 0-: Not a valid slave > > [2016-05-25 06:47:47.698433] E > > [glusterd-geo-rep.c:2240:glusterd_op_stage_gsync_create] 0-: > > glusterfs02.sh3.ctripcorp.com::filews_slave is not a valid slave volume. > > Error: Unable to fetch slave volume details. Please check the slave > > cluster and slave volume. > > [2016-05-25 06:47:47.698451] E > > [glusterd-syncop.c:1201:gd_stage_op_phase] > > 0-management: Staging of operation 'Volume Geo-replication Create' > > failed on localhost : Unable to fetch slave volume details. Please > > check the slave cluster and slave volume. > > > > > > > > > > > > > > Best Regards > > 杨雨阳 Yuyang Yang > > > > > > -----邮件原件----- > > 发件人: Kotresh Hiremath Ravishankar [mailto:khiremat@xxxxxxxxxx] > > 发送时间: Wednesday, May 25, 2016 2:06 PM > > 收件人: vyyy杨雨阳 <yuyangyang@xxxxxxxxx> > > 抄送: Saravanakumar Arumugam <sarumuga@xxxxxxxxxx>; > > Gluster-users@xxxxxxxxxxx; Aravinda Vishwanathapura Krishna Murthy > > <avishwan@xxxxxxxxxx> > > 主题: Re: geo-replication status partial faulty > > > > Hi, > > > > Verify below before proceeding further. > > > > 1. Run the following command in all the master nodes and > > You should find only one directory (session directory) > > and rest all are files. If you find two directories, it > > needs a clean up in all master nodes to have the same > > session directory in all master nodes. > > > > ls -l /var/lib/glusterd/geo-replication/ > > > > 2. Run the following command in all master nodes and you should > > find "*.status" file in all of them. > > > > ls -l /var/lib/glusterd/geo-replication/<session_directory> > > > > > > Follow the below steps to clean up ssh keys and do a fresh setup. > > > > In all the slave nodes, clean up ssh keys prefixed with > > command=...gsyncd and command=tar.. in /root/.ssh/authorized_keys. > > Also cleanup id_rsa.pub if you had copied form secret.pem and setup > > usual passwordless ssh connection using ssh-copy-id > > > > 1. Establish passwordless SSH between one of master node and one of > > slave node. > > (not required to copy secret.pem use the usual ssh-copy-id way) > > Remember to run all geo-rep commands on same master node and use the > > same > > slave node for geo-rep commands. > > > > 2. Stop and Delete geo-rep session as follows. > > gluster vol geo-rep <master-vol> <slave-host1>::<slave-vol> stop > > gluster vol geo-rep <master-vol> <slave-host1>::<slave-vol> delete > > > > 3. Create georep ssh keys again and do create force. > > gluster system:: exec gsec_create > > gluster vol geo-rep <master-vol> <slave-host>::<slave-vol1> create > > push-pem force > > > > 4. Verify keys have been distributed properly. The below command > > should automatically > > run the gsycnd.py without asking password from any master node to any > > slave host. > > > > ssh -i /var/lib/glusterd/geo-replication/secret.pem > > root@<slave-host> > > > > 4. Start geo-rep > > gluster vol geo-rep <master-vol> <slave-host>::<slave-vol1> start > > > > Let me know if you still face issues. > > > > > > Thanks and Regards, > > Kotresh H R > > > > > > > > > > Thanks and Regards, > > Kotresh H R > > > > ----- Original Message ----- > > > From: "vyyy杨雨阳" <yuyangyang@xxxxxxxxx> > > > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > > > Cc: "Saravanakumar Arumugam" <sarumuga@xxxxxxxxxx>, > > > Gluster-users@xxxxxxxxxxx, "Aravinda Vishwanathapura Krishna Murthy" > > > <avishwan@xxxxxxxxxx> > > > Sent: Wednesday, May 25, 2016 7:11:08 AM > > > Subject: 答复: 答复: 答复: 答复: 答复: 答复: geo-replication > > > status partial faulty > > > > > > Commands output as following, Thanks > > > > > > [root@SVR8048HW2285 ~]# gluster volume geo-replication filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave status > > > > > > MASTER NODE MASTER VOL MASTER BRICK SLAVE > > > STATUS CHECKPOINT STATUS > > > CRAWL STATUS > > > ------------------------------------------------------------------------------------------------------------------------------------------------- > > > SVR8048HW2285 filews /export/sdb/filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SH02SVR5954 filews /export/sdb/brick1 > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SH02SVR5951 filews /export/sdb/brick1 > > > glusterfs06.sh3.ctripcorp.com::filews_slave Passive N/A > > > N/A > > > SVR8050HW2285 filews /export/sdb/filews > > > glusterfs03.sh3.ctripcorp.com::filews_slave Passive N/A > > > N/A > > > SVR8049HW2285 filews /export/sdb/filews > > > glusterfs05.sh3.ctripcorp.com::filews_slave Active N/A > > > Hybrid Crawl > > > SVR8047HW2285 filews /export/sdb/filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave Active N/A > > > Hybrid Crawl > > > SVR6995HW2285 filews /export/sdb/filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SVR6993HW2285 filews /export/sdb/filews > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SH02SVR5953 filews /export/sdb/brick1 > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SH02SVR5952 filews /export/sdb/brick1 > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > N/A > > > SVR6996HW2285 filews /export/sdb/filews > > > glusterfs04.sh3.ctripcorp.com::filews_slave Passive N/A > > > N/A > > > SVR6994HW2285 filews /export/sdb/filews > > > glusterfs02.sh3.ctripcorp.com::filews_slave Passive N/A > > > N/A > > > > > > [root@SVR8048HW2285 ~]# ls -l /var/lib/glusterd/geo-replication/ > > > total 40 > > > -rw------- 1 root root 14140 May 20 16:00 common_secret.pem.pub > > > drwxr-xr-x 2 root root 4096 May 25 09:35 > > > filews_glusterfs01.sh3.ctripcorp.com_filews_slave > > > -rwxr-xr-x 1 root root 1845 May 17 15:04 gsyncd_template.conf > > > -rw------- 1 root root 1675 May 20 11:03 secret.pem > > > -rw-r--r-- 1 root root 400 May 20 11:03 secret.pem.pub > > > -rw------- 1 root root 1675 May 20 16:00 tar_ssh.pem > > > -rw-r--r-- 1 root root 400 May 20 16:00 tar_ssh.pem.pub > > > [root@SVR8048HW2285 ~]# > > > > > > > > > > > > Best Regards > > > 杨雨阳 Yuyang Yang > > > OPS > > > Ctrip Infrastructure Service (CIS) > > > Ctrip Computer Technology (Shanghai) Co., Ltd > > > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389 > > > Web: www.Ctrip.com > > > > > > > > > -----邮件原件----- > > > 发件人: Kotresh Hiremath Ravishankar [mailto:khiremat@xxxxxxxxxx] > > > 发送时间: Tuesday, May 24, 2016 6:41 PM > > > 收件人: vyyy杨雨阳 <yuyangyang@xxxxxxxxx> > > > 抄送: Saravanakumar Arumugam <sarumuga@xxxxxxxxxx>; > > > Gluster-users@xxxxxxxxxxx; Aravinda Vishwanathapura Krishna Murthy > > > <avishwan@xxxxxxxxxx> > > > 主题: Re: 答复: 答复: 答复: 答复: 答复: geo-replication status > > > partial faulty > > > > > > Ok, it looks like there is a problem with ssh key distribution. > > > > > > Before I suggest to clean those up and do setup again, could you > > > share the output of following commands > > > > > > 1. gluster vol geo-rep <master_vol> <slave_host>::slave status 2. ls > > > -l /var/lib/glusterd/geo-replication/ > > > > > > Is there multiple geo-rep sessions from this master volume or only one? > > > > > > Thanks and Regards, > > > Kotresh H R > > > > > > ----- Original Message ----- > > > > From: "vyyy杨雨阳" <yuyangyang@xxxxxxxxx> > > > > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > > > > Cc: "Saravanakumar Arumugam" <sarumuga@xxxxxxxxxx>, > > > > Gluster-users@xxxxxxxxxxx, "Aravinda Vishwanathapura Krishna Murthy" > > > > <avishwan@xxxxxxxxxx> > > > > Sent: Tuesday, May 24, 2016 3:19:55 PM > > > > Subject: 答复: 答复: 答复: 答复: 答复: geo-replication > > > > status partial faulty > > > > > > > > We can establish passwordless ssh directly with command 'ssh' , > > > > but when create push-pem, it shows ' Passwordless ssh login has > > > > not been setup ' > > > > unless copy secret.pem to *id_rsa.pub > > > > > > > > [root@SVR8048HW2285 ~]# ssh -i > > > > /var/lib/glusterd/geo-replication/secret.pem > > > > root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > > > Last login: Tue May 24 17:23:53 2016 from 10.8.230.213 This is a > > > > private network server, in monitoring state. > > > > It is strictly prohibited to unauthorized access and used. > > > > [root@SVR6519HW2285 ~]# > > > > > > > > > > > > [root@SVR8048HW2285 filews]# gluster volume geo-replication filews > > > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force > > > > Passwordless ssh login has not been setup with > > > > glusterfs01.sh3.ctripcorp.com for user root. > > > > geo-replication command failed > > > > [root@SVR8048HW2285 filews]# > > > > > > > > > > > > > > > > Best Regards > > > > 杨雨阳 Yuyang Yang > > > > > > > > > > > > -----邮件原件----- > > > > 发件人: Kotresh Hiremath Ravishankar [mailto:khiremat@xxxxxxxxxx] > > > > 发送时间: Tuesday, May 24, 2016 3:22 PM > > > > 收件人: vyyy杨雨阳 <yuyangyang@xxxxxxxxx> > > > > 抄送: Saravanakumar Arumugam <sarumuga@xxxxxxxxxx>; > > > > Gluster-users@xxxxxxxxxxx; Aravinda Vishwanathapura Krishna Murthy > > > > <avishwan@xxxxxxxxxx> > > > > 主题: Re: 答复: 答复: 答复: 答复: geo-replication status > > > > partial faulty > > > > > > > > Hi > > > > > > > > Could you try following command from corresponding masters to > > > > faulty slave nodes and share the output? > > > > The below command should not ask for password and should run gsync.py. > > > > > > > > ssh -i /var/lib/glusterd/geo-replication/secret.pem root@<faulty > > > > hosts> > > > > > > > > To establish passwordless ssh, it is not necessary to copy > > > > secret.pem to *id_rsa.pub. > > > > > > > > If the geo-rep session is already established, passwordless ssh > > > > would already be there. > > > > My suspect is that when I asked you to do 'create force' you did > > > > it using another slave where password less ssh was not setup. This > > > > would create another session directory in > > > > '/var/lib/glusterd/geo-replication' i.e > > > > (<master_vol>_<slave_host>_<slave_vol>) > > > > > > > > Please check and let us know. > > > > > > > > Thanks and Regards, > > > > Kotresh H R > > > > > > > > ----- Original Message ----- > > > > > From: "vyyy杨雨阳" <yuyangyang@xxxxxxxxx> > > > > > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > > > > > Cc: "Saravanakumar Arumugam" <sarumuga@xxxxxxxxxx>, > > > > > Gluster-users@xxxxxxxxxxx, "Aravinda Vishwanathapura Krishna Murthy" > > > > > <avishwan@xxxxxxxxxx> > > > > > Sent: Friday, May 20, 2016 12:35:58 PM > > > > > Subject: 答复: 答复: 答复: 答复: geo-replication status > > > > > partial faulty > > > > > > > > > > Hello, Kotresh > > > > > > > > > > I 'create force', but still some nodes work ,some nodes faulty. > > > > > > > > > > On faulty nodes > > > > > etc-glusterfs-glusterd.vol.log shown: > > > > > [2016-05-20 06:27:03.260870] I > > > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using > > > > > passed config > > > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > > > > > [2016-05-20 06:27:03.404544] E > > > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: > > > > > Unable to read gsyncd status file > > > > > [2016-05-20 06:27:03.404583] E > > > > > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable > > > > > to read the statusfile for /export/sdb/brick1 brick for > > > > > filews(master), > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session > > > > > > > > > > > > > > > /var/log/glusterfs/geo-replication/filews/ssh%3A%2F%2Froot%4010.15.65. > > > > > 66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log > > > > > shown: > > > > > [2016-05-20 15:04:01.858340] I [monitor(monitor):215:monitor] > > > > > Monitor: > > > > > ------------------------------------------------------------ > > > > > [2016-05-20 15:04:01.858688] I [monitor(monitor):216:monitor] > > > > > Monitor: > > > > > starting gsyncd worker > > > > > [2016-05-20 15:04:01.986754] D [gsyncd(agent):627:main_i] <top>: > > > > > rpc_fd: > > > > > '7,11,10,9' > > > > > [2016-05-20 15:04:01.987505] I > > > > > [changelogagent(agent):72:__init__] > > > > > ChangelogAgent: Agent listining... > > > > > [2016-05-20 15:04:01.988079] I [repce(agent):92:service_loop] > > > > > RepceServer: > > > > > terminating on reaching EOF. > > > > > [2016-05-20 15:04:01.988238] I [syncdutils(agent):214:finalize] > > > > > <top>: > > > > > exiting. > > > > > [2016-05-20 15:04:01.988250] I [monitor(monitor):267:monitor] > > > > > Monitor: > > > > > worker(/export/sdb/brick1) died before establishing connection > > > > > > > > > > Can you help me! > > > > > > > > > > > > > > > Best Regards > > > > > 杨雨阳 Yuyang Yang > > > > > > > > > > > > > > > > > > > > -----邮件原件----- > > > > > 发件人: vyyy杨雨阳 > > > > > 发送时间: Thursday, May 19, 2016 7:45 PM > > > > > 收件人: 'Kotresh Hiremath Ravishankar' <khiremat@xxxxxxxxxx> > > > > > 抄送: Saravanakumar Arumugam <sarumuga@xxxxxxxxxx>; > > > > > Gluster-users@xxxxxxxxxxx; Aravinda Vishwanathapura Krishna > > > > > Murthy <avishwan@xxxxxxxxxx> > > > > > 主题: 答复: 答复: 答复: 答复: geo-replication status > > > > > partial faulty > > > > > > > > > > Still not work. > > > > > > > > > > I need copy /var/lib/glusterd/geo-replication/secret.* to > > > > > /root/.ssh/id_rsa to make passwordless ssh work. > > > > > > > > > > I generate /var/lib/glusterd/geo-replication/secret.pem file on > > > > > every master nodes. > > > > > > > > > > I am not sure is this right. > > > > > > > > > > > > > > > [root@sh02svr5956 ~]# gluster volume geo-replication filews > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem > > > > > force Passwordless ssh login has not been setup with > > > > > glusterfs01.sh3.ctripcorp.com for user root. > > > > > geo-replication command failed > > > > > > > > > > [root@sh02svr5956 .ssh]# cp > > > > > /var/lib/glusterd/geo-replication/secret.pem > > > > > ./id_rsa > > > > > cp: overwrite `./id_rsa'? y > > > > > [root@sh02svr5956 .ssh]# cp > > > > > /var/lib/glusterd/geo-replication/secret.pem.pub > > > > > ./id_rsa.pub > > > > > cp: overwrite `./id_rsa.pub'? > > > > > > > > > > [root@sh02svr5956 ~]# gluster volume geo-replication filews > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem > > > > > force Creating geo-replication session between filews & > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave has been successful > > > > > [root@sh02svr5956 ~]# > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > 杨雨阳 Yuyang Yang > > > > > OPS > > > > > Ctrip Infrastructure Service (CIS) Ctrip Computer Technology > > > > > (Shanghai) Co., Ltd > > > > > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389 > > > > > Web: www.Ctrip.com > > > > > > > > > > > > > > > -----邮件原件----- > > > > > 发件人: Kotresh Hiremath Ravishankar [mailto:khiremat@xxxxxxxxxx] > > > > > 发送时间: Thursday, May 19, 2016 5:07 PM > > > > > 收件人: vyyy杨雨阳 <yuyangyang@xxxxxxxxx> > > > > > 抄送: Saravanakumar Arumugam <sarumuga@xxxxxxxxxx>; > > > > > Gluster-users@xxxxxxxxxxx; Aravinda Vishwanathapura Krishna > > > > > Murthy <avishwan@xxxxxxxxxx> > > > > > 主题: Re: 答复: 答复: 答复: geo-replication status > > > > > partial faulty > > > > > > > > > > Hi, > > > > > > > > > > Could you just try 'create force' once to fix those status file > > > > > errors? > > > > > > > > > > e.g., 'gluster volume geo-rep <master vol> <slave host>::<slave > > > > > vol> create push-pem force > > > > > > > > > > Thanks and Regards, > > > > > Kotresh H R > > > > > > > > > > ----- Original Message ----- > > > > > > From: "vyyy杨雨阳" <yuyangyang@xxxxxxxxx> > > > > > > To: "Saravanakumar Arumugam" <sarumuga@xxxxxxxxxx>, > > > > > > Gluster-users@xxxxxxxxxxx, "Aravinda Vishwanathapura Krishna > > > > > > Murthy" > > > > > > <avishwan@xxxxxxxxxx>, "Kotresh Hiremath Ravishankar" > > > > > > <khiremat@xxxxxxxxxx> > > > > > > Sent: Thursday, May 19, 2016 2:15:34 PM > > > > > > Subject: 答复: 答复: 答复: geo-replication status > > > > > > partial faulty > > > > > > > > > > > > I have checked all the nodes both on masters and slaves, the > > > > > > software is the same. > > > > > > > > > > > > I am puzzled why there were half masters work, halt faulty. > > > > > > > > > > > > > > > > > > [admin@SVR6996HW2285 ~]$ rpm -qa |grep gluster > > > > > > glusterfs-api-3.6.3-1.el6.x86_64 > > > > > > glusterfs-fuse-3.6.3-1.el6.x86_64 > > > > > > glusterfs-geo-replication-3.6.3-1.el6.x86_64 > > > > > > glusterfs-3.6.3-1.el6.x86_64 > > > > > > glusterfs-cli-3.6.3-1.el6.x86_64 > > > > > > glusterfs-server-3.6.3-1.el6.x86_64 > > > > > > glusterfs-libs-3.6.3-1.el6.x86_64 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > 杨雨阳 Yuyang Yang > > > > > > > > > > > > OPS > > > > > > Ctrip Infrastructure Service (CIS) Ctrip Computer Technology > > > > > > (Shanghai) Co., Ltd > > > > > > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389 > > > > > > Web: www.Ctrip.com<http://www.ctrip.com/> > > > > > > > > > > > > > > > > > > > > > > > > 发件人: Saravanakumar Arumugam [mailto:sarumuga@xxxxxxxxxx] > > > > > > 发送时间: Thursday, May 19, 2016 4:33 PM > > > > > > 收件人: vyyy杨雨阳 <yuyangyang@xxxxxxxxx>; > > > > > > Gluster-users@xxxxxxxxxxx; Aravinda Vishwanathapura Krishna > > > > > > Murthy <avishwan@xxxxxxxxxx>; Kotresh Hiremath Ravishankar > > > > > > <khiremat@xxxxxxxxxx> > > > > > > 主题: Re: 答复: 答复: geo-replication status partial > > > > > > faulty > > > > > > > > > > > > Hi, > > > > > > +geo-rep team. > > > > > > > > > > > > Can you get the gluster version you are using? > > > > > > > > > > > > # For example: > > > > > > rpm -qa | grep gluster > > > > > > > > > > > > I hope you have same gluster version installed everywhere. > > > > > > Please double check and share the same. > > > > > > > > > > > > Thanks, > > > > > > Saravana > > > > > > On 05/19/2016 01:37 PM, vyyy杨雨阳 wrote: > > > > > > Hi, Saravana > > > > > > > > > > > > I have changed log level to DEBUG. Then start geo-replication > > > > > > with log-file option, attached the file. > > > > > > > > > > > > gluster volume geo-replication filews > > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave start > > > > > > --log-file=geo.log > > > > > > > > > > > > I have checked /root/.ssh/authorized_keys in > > > > > > glusterfs01.sh3.ctripcorp.com , It have entries in > > > > > > /var/lib/glusterd/geo-replication/common_secret.pem.pub. > > > > > > and I have removed the lines not started with “command=” > > > > > > > > > > > > ssh -i /var/lib/glusterd/geo-replication/secret.pem root@ > > > > > > glusterfs01.sh3.ctripcorp.com I can see gsyncd messages and no > > > > > > ssh error. > > > > > > > > > > > > > > > > > > Attached etc-glusterfs-glusterd.vol.log from faulty node, it shows > > > > > > : > > > > > > > > > > > > [2016-05-19 06:39:23.405974] I > > > > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using > > > > > > passed config > > > > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > > > > > > [2016-05-19 06:39:23.541169] E > > > > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: > > > > > > Unable to read gsyncd status file > > > > > > [2016-05-19 06:39:23.541210] E > > > > > > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable > > > > > > to read the statusfile for /export/sdb/filews brick for > > > > > > filews(master), > > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session > > > > > > [2016-05-19 06:39:29.472047] I > > > > > > [glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-: > > > > > > Using passed config > > > > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > > > > > > [2016-05-19 06:39:34.939709] I > > > > > > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using > > > > > > passed config > > > > > > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf). > > > > > > [2016-05-19 06:39:35.058520] E > > > > > > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: > > > > > > Unable to read gsyncd status file > > > > > > > > > > > > > > > > > > /var/log/glusterfs/geo-replication/filews/ > > > > > > ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Af > > > > > > il > > > > > > ew > > > > > > s_ > > > > > > sl > > > > > > ave.log > > > > > > shows as following: > > > > > > > > > > > > [2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor] > > > > > > Monitor: > > > > > > ------------------------------------------------------------ > > > > > > [2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor] > > > > > > Monitor: > > > > > > starting gsyncd worker > > > > > > [2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i] <top>: > > > > > > rpc_fd: > > > > > > '7,11,10,9' > > > > > > [2016-05-19 15:11:37.423882] I > > > > > > [changelogagent(agent):72:__init__] > > > > > > ChangelogAgent: Agent listining... > > > > > > [2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor] > > > > > > Monitor: > > > > > > worker(/export/sdb/filews) died before establishing connection > > > > > > [2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop] > > > > > > RepceServer: > > > > > > terminating on reaching EOF. > > > > > > [2016-05-19 15:11:37.424335] I > > > > > > [syncdutils(agent):214:finalize] > > > > > > <top>: > > > > > > exiting. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > Yuyang Yang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 发 件人: Saravanakumar Arumugam [mailto:sarumuga@xxxxxxxxxx]and > > > > > > share what's the output? > > > > > > 发送时间: Thursday, May 19, 2016 1:59 PM > > > > > > 收件人: vyyy杨雨阳 > > > > > > <yuyangyang@xxxxxxxxx><mailto:yuyangyang@xxxxxxxxx>; > > > > > > Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx> > > > > > > 主题: Re: 答复: geo-replication status partial > > > > > > faulty > > > > > > > > > > > > Hi, > > > > > > > > > > > > There seems to be some issue in glusterfs01.sh3.ctripcorp.com > > > > > > slave node. > > > > > > Can you share the complete logs ? > > > > > > > > > > > > You can increase verbosity of debug messages like this: > > > > > > gluster volume geo-replication <master volume> <slave > > > > > > host>::<slave > > > > > > volume> config log-level DEBUG > > > > > > > > > > > > > > > > > > Also, check /root/.ssh/authorized_keys in > > > > > > glusterfs01.sh3.ctripcorp.com It should have entries in > > > > > > /var/lib/glusterd/geo-replication/common_secret.pem.pub > > > > > > (present in master node). > > > > > > > > > > > > Have a look at this one for example: > > > > > > https://www.gluster.org/pipermail/gluster-users/2015-August/02 > > > > > > 31 > > > > > > 74 > > > > > > .h > > > > > > tm > > > > > > l > > > > > > > > > > > > Thanks, > > > > > > Saravana > > > > > > On 05/19/2016 07:53 AM, vyyy杨雨阳 wrote: > > > > > > Hello, > > > > > > > > > > > > I have tried to config a geo-replication volume , all the > > > > > > master nodes configuration are the same, When I start this > > > > > > volume, the status shows partial faulty as following: > > > > > > > > > > > > gluster volume geo-replication filews > > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave status > > > > > > > > > > > > MASTER NODE MASTER VOL MASTER BRICK SLAVE > > > > > > STATUS CHECKPOINT STATUS > > > > > > CRAWL STATUS > > > > > > ------------------------------------------------------------------------------------------------------------------------------------------------- > > > > > > SVR8048HW2285 filews /export/sdb/filews > > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > > > > N/A > > > > > > SVR8050HW2285 filews /export/sdb/filews > > > > > > glusterfs03.sh3.ctripcorp.com::filews_slave Passive N/A > > > > > > N/A > > > > > > SVR8047HW2285 filews /export/sdb/filews > > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave Active N/A > > > > > > Hybrid Crawl > > > > > > SVR8049HW2285 filews /export/sdb/filews > > > > > > glusterfs05.sh3.ctripcorp.com::filews_slave Active N/A > > > > > > Hybrid Crawl > > > > > > SH02SVR5951 filews /export/sdb/brick1 > > > > > > glusterfs06.sh3.ctripcorp.com::filews_slave Passive N/A > > > > > > N/A > > > > > > SH02SVR5953 filews /export/sdb/brick1 > > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > > > > N/A > > > > > > SVR6995HW2285 filews /export/sdb/filews > > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > > > > N/A > > > > > > SH02SVR5954 filews /export/sdb/brick1 > > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > > > > N/A > > > > > > SVR6994HW2285 filews /export/sdb/filews > > > > > > glusterfs02.sh3.ctripcorp.com::filews_slave Passive N/A > > > > > > N/A > > > > > > SVR6993HW2285 filews /export/sdb/filews > > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > > > > N/A > > > > > > SH02SVR5952 filews /export/sdb/brick1 > > > > > > glusterfs01.sh3.ctripcorp.com::filews_slave faulty N/A > > > > > > N/A > > > > > > SVR6996HW2285 filews /export/sdb/filews > > > > > > glusterfs04.sh3.ctripcorp.com::filews_slave Passive N/A > > > > > > N/A > > > > > > > > > > > > On the faulty node, log file > > > > > > /var/log/glusterfs/geo-replication/filews > > > > > > shows > > > > > > worker(/export/sdb/filews) died before establishing connection > > > > > > > > > > > > [2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor] > > > > > > Monitor: > > > > > > ------------------------------------------------------------ > > > > > > [2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor] > > > > > > Monitor: > > > > > > starting gsyncd worker > > > > > > [2016-05-18 16:55:46.517460] I > > > > > > [changelogagent(agent):72:__init__] > > > > > > ChangelogAgent: Agent listining... > > > > > > [2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop] > > > > > > RepceServer: > > > > > > terminating on reaching EOF. > > > > > > [2016-05-18 16:55:46.518279] I > > > > > > [syncdutils(agent):214:finalize] > > > > > > <top>: > > > > > > exiting. > > > > > > [2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor] > > > > > > Monitor: > > > > > > worker(/export/sdb/filews) died before establishing connection > > > > > > [2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor] > > > > > > Monitor: > > > > > > ------------------------------------------------------------ > > > > > > > > > > > > Any advice and suggestions will be greatly appreciated. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards > > > > > > Yuyang Yang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > Gluster-users mailing list > > > > > > > > > > > > Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx> > > > > > > > > > > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users