Re: 答复: 答复: 答复: 答复: geo-replication status partial faulty

Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx> · Tue, 24 May 2016 03:22:20 -0400 (EDT)

Hi

Could you try following command from corresponding masters to faulty slave nodes and
share the output?
The below command should not ask for password and should run gsync.py.

ssh -i /var/lib/glusterd/geo-replication/secret.pem root@<faulty hosts>

To establish passwordless ssh, it is not necessary to copy secret.pem to *id_rsa.pub.

If the geo-rep session is already established, passwordless ssh would already be there.
My suspect is that when I asked you to do 'create force' you did it using another slave
where password less ssh was not setup. This would create another session
directory in '/var/lib/glusterd/geo-replication' i.e (<master_vol>_<slave_host>_<slave_vol>)

Please check and let us know.

Thanks and Regards,
Kotresh H R

----- Original Message -----
> From: "vyyy杨雨阳" <yuyangyang@xxxxxxxxx>
> To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
> Cc: "Saravanakumar Arumugam" <sarumuga@xxxxxxxxxx>, Gluster-users@xxxxxxxxxxx, "Aravinda Vishwanathapura Krishna
> Murthy" <avishwan@xxxxxxxxxx>
> Sent: Friday, May 20, 2016 12:35:58 PM
> Subject: 答复: 答复: 答复:  答复: geo-replication status partial faulty
> 
> Hello, Kotresh
> 
> I 'create force', but still some nodes work ,some nodes faulty.
> 
> On faulty nodes
> etc-glusterfs-glusterd.vol.log shown:
> [2016-05-20 06:27:03.260870] I
> [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed config
> template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> [2016-05-20 06:27:03.404544] E
> [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to read
> gsyncd status file
> [2016-05-20 06:27:03.404583] E
> [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to read the
> statusfile for /export/sdb/brick1 brick for  filews(master),
> glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> 
> 
> /var/log/glusterfs/geo-replication/filews/ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log
> shown:
> [2016-05-20 15:04:01.858340] I [monitor(monitor):215:monitor] Monitor:
> ------------------------------------------------------------
> [2016-05-20 15:04:01.858688] I [monitor(monitor):216:monitor] Monitor:
> starting gsyncd worker
> [2016-05-20 15:04:01.986754] D [gsyncd(agent):627:main_i] <top>: rpc_fd:
> '7,11,10,9'
> [2016-05-20 15:04:01.987505] I [changelogagent(agent):72:__init__]
> ChangelogAgent: Agent listining...
> [2016-05-20 15:04:01.988079] I [repce(agent):92:service_loop] RepceServer:
> terminating on reaching EOF.
> [2016-05-20 15:04:01.988238] I [syncdutils(agent):214:finalize] <top>:
> exiting.
> [2016-05-20 15:04:01.988250] I [monitor(monitor):267:monitor] Monitor:
> worker(/export/sdb/brick1) died before establishing connection
> 
> Can you help me!
> 
> 
> Best Regards
> 杨雨阳 Yuyang Yang
> 
> 
> 
> -----邮件原件-----
> 发件人: vyyy杨雨阳
> 发送时间: Thursday, May 19, 2016 7:45 PM
> 收件人: 'Kotresh Hiremath Ravishankar' <khiremat@xxxxxxxxxx>
> 抄送: Saravanakumar Arumugam <sarumuga@xxxxxxxxxx>; Gluster-users@xxxxxxxxxxx;
> Aravinda Vishwanathapura Krishna Murthy <avishwan@xxxxxxxxxx>
> 主题: 答复: 答复: 答复:  答复: geo-replication status partial faulty
> 
> Still not work.
> 
> I need copy /var/lib/glusterd/geo-replication/secret.* to /root/.ssh/id_rsa
> to make passwordless ssh work.
> 
>  I generate /var/lib/glusterd/geo-replication/secret.pem file on  every
>  master nodes.
> 
> I am not sure is this right.
> 
> 
> [root@sh02svr5956 ~]# gluster volume geo-replication filews
> glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force
> Passwordless ssh login has not been setup with glusterfs01.sh3.ctripcorp.com
> for user root.
> geo-replication command failed
> 
> [root@sh02svr5956 .ssh]# cp /var/lib/glusterd/geo-replication/secret.pem
> ./id_rsa
> cp: overwrite `./id_rsa'? y
> [root@sh02svr5956 .ssh]# cp /var/lib/glusterd/geo-replication/secret.pem.pub
> ./id_rsa.pub
> cp: overwrite `./id_rsa.pub'?
> 
>  [root@sh02svr5956 ~]# gluster volume geo-replication filews
>  glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force Creating
>  geo-replication session between filews &
>  glusterfs01.sh3.ctripcorp.com::filews_slave has been successful
> [root@sh02svr5956 ~]#
> 
> 
> 
> 
> Best Regards
> 杨雨阳 Yuyang Yang
> OPS
> Ctrip Infrastructure Service (CIS)
> Ctrip Computer Technology (Shanghai) Co., Ltd
> Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> Web: www.Ctrip.com
> 
> 
> -----邮件原件-----
> 发件人: Kotresh Hiremath Ravishankar [mailto:khiremat@xxxxxxxxxx]
> 发送时间: Thursday, May 19, 2016 5:07 PM
> 收件人: vyyy杨雨阳 <yuyangyang@xxxxxxxxx>
> 抄送: Saravanakumar Arumugam <sarumuga@xxxxxxxxxx>; Gluster-users@xxxxxxxxxxx;
> Aravinda Vishwanathapura Krishna Murthy <avishwan@xxxxxxxxxx>
> 主题: Re: 答复: 答复:  答复: geo-replication status partial faulty
> 
> Hi,
> 
> Could you just try 'create force' once to fix those status file errors?
> 
> e.g., 'gluster volume geo-rep <master vol> <slave host>::<slave vol> create
> push-pem force
> 
> Thanks and Regards,
> Kotresh H R
> 
> ----- Original Message -----
> > From: "vyyy杨雨阳" <yuyangyang@xxxxxxxxx>
> > To: "Saravanakumar Arumugam" <sarumuga@xxxxxxxxxx>,
> > Gluster-users@xxxxxxxxxxx, "Aravinda Vishwanathapura Krishna Murthy"
> > <avishwan@xxxxxxxxxx>, "Kotresh Hiremath Ravishankar"
> > <khiremat@xxxxxxxxxx>
> > Sent: Thursday, May 19, 2016 2:15:34 PM
> > Subject: 答复: 答复:  答复: geo-replication status partial
> > faulty
> > 
> > I have checked all the nodes both on masters and slaves, the software
> > is the same.
> > 
> > I am puzzled why there were half masters work, halt faulty.
> > 
> > 
> > [admin@SVR6996HW2285 ~]$ rpm -qa |grep gluster
> > glusterfs-api-3.6.3-1.el6.x86_64
> > glusterfs-fuse-3.6.3-1.el6.x86_64
> > glusterfs-geo-replication-3.6.3-1.el6.x86_64
> > glusterfs-3.6.3-1.el6.x86_64
> > glusterfs-cli-3.6.3-1.el6.x86_64
> > glusterfs-server-3.6.3-1.el6.x86_64
> > glusterfs-libs-3.6.3-1.el6.x86_64
> > 
> > 
> > 
> > 
> > Best Regards
> > 杨雨阳 Yuyang Yang
> > 
> > OPS
> > Ctrip Infrastructure Service (CIS)
> > Ctrip Computer Technology (Shanghai) Co., Ltd
> > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> > Web: www.Ctrip.com<http://www.ctrip.com/>
> > 
> > 
> > 
> > 发件人: Saravanakumar Arumugam [mailto:sarumuga@xxxxxxxxxx]
> > 发送时间: Thursday, May 19, 2016 4:33 PM
> > 收件人: vyyy杨雨阳 <yuyangyang@xxxxxxxxx>; Gluster-users@xxxxxxxxxxx;
> > Aravinda Vishwanathapura Krishna Murthy <avishwan@xxxxxxxxxx>; Kotresh
> > Hiremath Ravishankar <khiremat@xxxxxxxxxx>
> > 主题: Re: 答复:  答复: geo-replication status partial faulty
> > 
> > Hi,
> > +geo-rep team.
> > 
> > Can you get the gluster version you are using?
> > 
> > # For example:
> > rpm -qa | grep gluster
> > 
> > I hope you have same gluster version installed everywhere.
> > Please double check and share the same.
> > 
> > Thanks,
> > Saravana
> > On 05/19/2016 01:37 PM, vyyy杨雨阳 wrote:
> > Hi, Saravana
> > 
> > I have changed log level to DEBUG. Then start geo-replication with
> > log-file option, attached the file.
> > 
> > gluster volume geo-replication filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave start --log-file=geo.log
> > 
> > I have checked  /root/.ssh/authorized_keys in
> > glusterfs01.sh3.ctripcorp.com , It  have entries in
> > /var/lib/glusterd/geo-replication/common_secret.pem.pub.
> > and I have removed the lines not started with “command=”
> > 
> > ssh -i /var/lib/glusterd/geo-replication/secret.pem  root@
> > glusterfs01.sh3.ctripcorp.com I can see gsyncd messages and no ssh
> > error.
> > 
> > 
> > Attached etc-glusterfs-glusterd.vol.log from faulty node, it shows :
> > 
> > [2016-05-19 06:39:23.405974] I
> > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed
> > config
> > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > [2016-05-19 06:39:23.541169] E
> > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to
> > read gsyncd status file
> > [2016-05-19 06:39:23.541210] E
> > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to read
> > the statusfile for /export/sdb/filews brick for  filews(master),
> > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> > [2016-05-19 06:39:29.472047] I
> > [glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-: Using passed
> > config
> > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > [2016-05-19 06:39:34.939709] I
> > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed
> > config
> > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > [2016-05-19 06:39:35.058520] E
> > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to
> > read gsyncd status file
> > 
> > 
> > /var/log/glusterfs/geo-replication/filews/
> > ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_sl
> > ave.log
> > shows as following:
> > 
> > [2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor] Monitor:
> > ------------------------------------------------------------
> > [2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor] Monitor:
> > starting gsyncd worker
> > [2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i] <top>: rpc_fd:
> > '7,11,10,9'
> > [2016-05-19 15:11:37.423882] I [changelogagent(agent):72:__init__]
> > ChangelogAgent: Agent listining...
> > [2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor] Monitor:
> > worker(/export/sdb/filews) died before establishing connection
> > [2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop] RepceServer:
> > terminating on reaching EOF.
> > [2016-05-19 15:11:37.424335] I [syncdutils(agent):214:finalize] <top>:
> > exiting.
> > 
> > 
> > 
> > 
> > 
> > 
> > Best Regards
> > Yuyang Yang
> > 
> > 
> > 
> > 
> > 
> > 发 件人: Saravanakumar Arumugam [mailto:sarumuga@xxxxxxxxxx]and share what's the output?
> > 发送时间: Thursday, May 19, 2016 1:59 PM
> > 收件人: vyyy杨雨阳 <yuyangyang@xxxxxxxxx><mailto:yuyangyang@xxxxxxxxx>;
> > Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx>
> > 主题: Re:  答复: geo-replication status partial faulty
> > 
> > Hi,
> > 
> > There seems to be some issue in glusterfs01.sh3.ctripcorp.com slave node.
> > Can you share the complete logs ?
> > 
> > You can increase verbosity of debug messages like this:
> > gluster volume geo-replication <master volume> <slave host>::<slave
> > volume> config log-level DEBUG
> > 
> > 
> > Also, check  /root/.ssh/authorized_keys in
> > glusterfs01.sh3.ctripcorp.com It should have entries in
> > /var/lib/glusterd/geo-replication/common_secret.pem.pub (present in
> > master node).
> > 
> > Have a look at this one for example:
> > https://www.gluster.org/pipermail/gluster-users/2015-August/023174.htm
> > l
> > 
> > Thanks,
> > Saravana
> > On 05/19/2016 07:53 AM, vyyy杨雨阳 wrote:
> > Hello,
> > 
> > I have tried to config a geo-replication volume , all the master nodes
> > configuration are the same, When I start this volume, the status shows
> > partial faulty as following:
> > 
> > gluster volume geo-replication filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave status
> > 
> > MASTER NODE      MASTER VOL    MASTER BRICK          SLAVE
> > STATUS     CHECKPOINT STATUS
> > CRAWL STATUS
> > -------------------------------------------------------------------------------------------------------------------------------------------------
> > SVR8048HW2285    filews        /export/sdb/filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SVR8050HW2285    filews        /export/sdb/filews
> > glusterfs03.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > N/A
> > SVR8047HW2285    filews        /export/sdb/filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave    Active     N/A
> > Hybrid Crawl
> > SVR8049HW2285    filews        /export/sdb/filews
> > glusterfs05.sh3.ctripcorp.com::filews_slave    Active     N/A
> > Hybrid Crawl
> > SH02SVR5951      filews        /export/sdb/brick1
> > glusterfs06.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > N/A
> > SH02SVR5953      filews        /export/sdb/brick1
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SVR6995HW2285    filews        /export/sdb/filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SH02SVR5954      filews        /export/sdb/brick1
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SVR6994HW2285    filews        /export/sdb/filews
> > glusterfs02.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > N/A
> > SVR6993HW2285    filews        /export/sdb/filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SH02SVR5952      filews        /export/sdb/brick1
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SVR6996HW2285    filews        /export/sdb/filews
> > glusterfs04.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > N/A
> > 
> > On the faulty node, log file /var/log/glusterfs/geo-replication/filews
> > shows
> > worker(/export/sdb/filews) died before establishing connection
> > 
> > [2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor] Monitor:
> > ------------------------------------------------------------
> > [2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor] Monitor:
> > starting gsyncd worker
> > [2016-05-18 16:55:46.517460] I [changelogagent(agent):72:__init__]
> > ChangelogAgent: Agent listining...
> > [2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop] RepceServer:
> > terminating on reaching EOF.
> > [2016-05-18 16:55:46.518279] I [syncdutils(agent):214:finalize] <top>:
> > exiting.
> > [2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor] Monitor:
> > worker(/export/sdb/filews) died before establishing connection
> > [2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor] Monitor:
> > ------------------------------------------------------------
> > 
> > Any advice and suggestions will be greatly appreciated.
> > 
> > 
> > 
> > 
> > 
> > Best Regards
> >        Yuyang Yang
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > 
> > Gluster-users mailing list
> > 
> > Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx>
> > 
> > http://www.gluster.org/mailman/listinfo/gluster-users
> > 
> > 
> > 
> 
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users