Re: 答复: 答复: 答复: 答复: geo-replication status partial faulty

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

Could you try following command from corresponding masters to faulty slave nodes and
share the output?
The below command should not ask for password and should run gsync.py.

ssh -i /var/lib/glusterd/geo-replication/secret.pem root@<faulty hosts>

To establish passwordless ssh, it is not necessary to copy secret.pem to *id_rsa.pub.

If the geo-rep session is already established, passwordless ssh would already be there.
My suspect is that when I asked you to do 'create force' you did it using another slave
where password less ssh was not setup. This would create another session
directory in '/var/lib/glusterd/geo-replication' i.e (<master_vol>_<slave_host>_<slave_vol>)

Please check and let us know.

Thanks and Regards,
Kotresh H R

----- Original Message -----
> From: "vyyy杨雨阳" <yuyangyang@xxxxxxxxx>
> To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
> Cc: "Saravanakumar Arumugam" <sarumuga@xxxxxxxxxx>, Gluster-users@xxxxxxxxxxx, "Aravinda Vishwanathapura Krishna
> Murthy" <avishwan@xxxxxxxxxx>
> Sent: Friday, May 20, 2016 12:35:58 PM
> Subject: 答复: 答复: 答复:  答复: geo-replication status partial faulty
> 
> Hello, Kotresh
> 
> I 'create force', but still some nodes work ,some nodes faulty.
> 
> On faulty nodes
> etc-glusterfs-glusterd.vol.log shown:
> [2016-05-20 06:27:03.260870] I
> [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed config
> template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> [2016-05-20 06:27:03.404544] E
> [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to read
> gsyncd status file
> [2016-05-20 06:27:03.404583] E
> [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to read the
> statusfile for /export/sdb/brick1 brick for  filews(master),
> glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> 
> 
> /var/log/glusterfs/geo-replication/filews/ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_slave.log
> shown:
> [2016-05-20 15:04:01.858340] I [monitor(monitor):215:monitor] Monitor:
> ------------------------------------------------------------
> [2016-05-20 15:04:01.858688] I [monitor(monitor):216:monitor] Monitor:
> starting gsyncd worker
> [2016-05-20 15:04:01.986754] D [gsyncd(agent):627:main_i] <top>: rpc_fd:
> '7,11,10,9'
> [2016-05-20 15:04:01.987505] I [changelogagent(agent):72:__init__]
> ChangelogAgent: Agent listining...
> [2016-05-20 15:04:01.988079] I [repce(agent):92:service_loop] RepceServer:
> terminating on reaching EOF.
> [2016-05-20 15:04:01.988238] I [syncdutils(agent):214:finalize] <top>:
> exiting.
> [2016-05-20 15:04:01.988250] I [monitor(monitor):267:monitor] Monitor:
> worker(/export/sdb/brick1) died before establishing connection
> 
> Can you help me!
> 
> 
> Best Regards
> 杨雨阳 Yuyang Yang
> 
> 
> 
> -----邮件原件-----
> 发件人: vyyy杨雨阳
> 发送时间: Thursday, May 19, 2016 7:45 PM
> 收件人: 'Kotresh Hiremath Ravishankar' <khiremat@xxxxxxxxxx>
> 抄送: Saravanakumar Arumugam <sarumuga@xxxxxxxxxx>; Gluster-users@xxxxxxxxxxx;
> Aravinda Vishwanathapura Krishna Murthy <avishwan@xxxxxxxxxx>
> 主题: 答复: 答复: 答复:  答复: geo-replication status partial faulty
> 
> Still not work.
> 
> I need copy /var/lib/glusterd/geo-replication/secret.* to /root/.ssh/id_rsa
> to make passwordless ssh work.
> 
>  I generate /var/lib/glusterd/geo-replication/secret.pem file on  every
>  master nodes.
> 
> I am not sure is this right.
> 
> 
> [root@sh02svr5956 ~]# gluster volume geo-replication filews
> glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force
> Passwordless ssh login has not been setup with glusterfs01.sh3.ctripcorp.com
> for user root.
> geo-replication command failed
> 
> [root@sh02svr5956 .ssh]# cp /var/lib/glusterd/geo-replication/secret.pem
> ./id_rsa
> cp: overwrite `./id_rsa'? y
> [root@sh02svr5956 .ssh]# cp /var/lib/glusterd/geo-replication/secret.pem.pub
> ./id_rsa.pub
> cp: overwrite `./id_rsa.pub'?
> 
>  [root@sh02svr5956 ~]# gluster volume geo-replication filews
>  glusterfs01.sh3.ctripcorp.com::filews_slave create push-pem force Creating
>  geo-replication session between filews &
>  glusterfs01.sh3.ctripcorp.com::filews_slave has been successful
> [root@sh02svr5956 ~]#
> 
> 
> 
> 
> Best Regards
> 杨雨阳 Yuyang Yang
> OPS
> Ctrip Infrastructure Service (CIS)
> Ctrip Computer Technology (Shanghai) Co., Ltd
> Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> Web: www.Ctrip.com
> 
> 
> -----邮件原件-----
> 发件人: Kotresh Hiremath Ravishankar [mailto:khiremat@xxxxxxxxxx]
> 发送时间: Thursday, May 19, 2016 5:07 PM
> 收件人: vyyy杨雨阳 <yuyangyang@xxxxxxxxx>
> 抄送: Saravanakumar Arumugam <sarumuga@xxxxxxxxxx>; Gluster-users@xxxxxxxxxxx;
> Aravinda Vishwanathapura Krishna Murthy <avishwan@xxxxxxxxxx>
> 主题: Re: 答复: 答复:  答复: geo-replication status partial faulty
> 
> Hi,
> 
> Could you just try 'create force' once to fix those status file errors?
> 
> e.g., 'gluster volume geo-rep <master vol> <slave host>::<slave vol> create
> push-pem force
> 
> Thanks and Regards,
> Kotresh H R
> 
> ----- Original Message -----
> > From: "vyyy杨雨阳" <yuyangyang@xxxxxxxxx>
> > To: "Saravanakumar Arumugam" <sarumuga@xxxxxxxxxx>,
> > Gluster-users@xxxxxxxxxxx, "Aravinda Vishwanathapura Krishna Murthy"
> > <avishwan@xxxxxxxxxx>, "Kotresh Hiremath Ravishankar"
> > <khiremat@xxxxxxxxxx>
> > Sent: Thursday, May 19, 2016 2:15:34 PM
> > Subject: 答复: 答复:  答复: geo-replication status partial
> > faulty
> > 
> > I have checked all the nodes both on masters and slaves, the software
> > is the same.
> > 
> > I am puzzled why there were half masters work, halt faulty.
> > 
> > 
> > [admin@SVR6996HW2285 ~]$ rpm -qa |grep gluster
> > glusterfs-api-3.6.3-1.el6.x86_64
> > glusterfs-fuse-3.6.3-1.el6.x86_64
> > glusterfs-geo-replication-3.6.3-1.el6.x86_64
> > glusterfs-3.6.3-1.el6.x86_64
> > glusterfs-cli-3.6.3-1.el6.x86_64
> > glusterfs-server-3.6.3-1.el6.x86_64
> > glusterfs-libs-3.6.3-1.el6.x86_64
> > 
> > 
> > 
> > 
> > Best Regards
> > 杨雨阳 Yuyang Yang
> > 
> > OPS
> > Ctrip Infrastructure Service (CIS)
> > Ctrip Computer Technology (Shanghai) Co., Ltd
> > Phone: + 86 21 34064880-15554 | Fax: + 86 21 52514588-13389
> > Web: www.Ctrip.com<http://www.ctrip.com/>
> > 
> > 
> > 
> > 发件人: Saravanakumar Arumugam [mailto:sarumuga@xxxxxxxxxx]
> > 发送时间: Thursday, May 19, 2016 4:33 PM
> > 收件人: vyyy杨雨阳 <yuyangyang@xxxxxxxxx>; Gluster-users@xxxxxxxxxxx;
> > Aravinda Vishwanathapura Krishna Murthy <avishwan@xxxxxxxxxx>; Kotresh
> > Hiremath Ravishankar <khiremat@xxxxxxxxxx>
> > 主题: Re: 答复:  答复: geo-replication status partial faulty
> > 
> > Hi,
> > +geo-rep team.
> > 
> > Can you get the gluster version you are using?
> > 
> > # For example:
> > rpm -qa | grep gluster
> > 
> > I hope you have same gluster version installed everywhere.
> > Please double check and share the same.
> > 
> > Thanks,
> > Saravana
> > On 05/19/2016 01:37 PM, vyyy杨雨阳 wrote:
> > Hi, Saravana
> > 
> > I have changed log level to DEBUG. Then start geo-replication with
> > log-file option, attached the file.
> > 
> > gluster volume geo-replication filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave start --log-file=geo.log
> > 
> > I have checked  /root/.ssh/authorized_keys in
> > glusterfs01.sh3.ctripcorp.com , It  have entries in
> > /var/lib/glusterd/geo-replication/common_secret.pem.pub.
> > and I have removed the lines not started with “command=”
> > 
> > ssh -i /var/lib/glusterd/geo-replication/secret.pem  root@
> > glusterfs01.sh3.ctripcorp.com I can see gsyncd messages and no ssh
> > error.
> > 
> > 
> > Attached etc-glusterfs-glusterd.vol.log from faulty node, it shows :
> > 
> > [2016-05-19 06:39:23.405974] I
> > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed
> > config
> > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > [2016-05-19 06:39:23.541169] E
> > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to
> > read gsyncd status file
> > [2016-05-19 06:39:23.541210] E
> > [glusterd-geo-rep.c:3603:glusterd_read_status_file] 0-: Unable to read
> > the statusfile for /export/sdb/filews brick for  filews(master),
> > glusterfs01.sh3.ctripcorp.com::filews_slave(slave) session
> > [2016-05-19 06:39:29.472047] I
> > [glusterd-geo-rep.c:1835:glusterd_get_statefile_name] 0-: Using passed
> > config
> > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > [2016-05-19 06:39:34.939709] I
> > [glusterd-geo-rep.c:3516:glusterd_read_status_file] 0-: Using passed
> > config
> > template(/var/lib/glusterd/geo-replication/filews_glusterfs01.sh3.ctripcorp.com_filews_slave/gsyncd.conf).
> > [2016-05-19 06:39:35.058520] E
> > [glusterd-geo-rep.c:3200:glusterd_gsync_read_frm_status] 0-: Unable to
> > read gsyncd status file
> > 
> > 
> > /var/log/glusterfs/geo-replication/filews/
> > ssh%3A%2F%2Froot%4010.15.65.66%3Agluster%3A%2F%2F127.0.0.1%3Afilews_sl
> > ave.log
> > shows as following:
> > 
> > [2016-05-19 15:11:37.307755] I [monitor(monitor):215:monitor] Monitor:
> > ------------------------------------------------------------
> > [2016-05-19 15:11:37.308059] I [monitor(monitor):216:monitor] Monitor:
> > starting gsyncd worker
> > [2016-05-19 15:11:37.423320] D [gsyncd(agent):627:main_i] <top>: rpc_fd:
> > '7,11,10,9'
> > [2016-05-19 15:11:37.423882] I [changelogagent(agent):72:__init__]
> > ChangelogAgent: Agent listining...
> > [2016-05-19 15:11:37.423906] I [monitor(monitor):267:monitor] Monitor:
> > worker(/export/sdb/filews) died before establishing connection
> > [2016-05-19 15:11:37.424151] I [repce(agent):92:service_loop] RepceServer:
> > terminating on reaching EOF.
> > [2016-05-19 15:11:37.424335] I [syncdutils(agent):214:finalize] <top>:
> > exiting.
> > 
> > 
> > 
> > 
> > 
> > 
> > Best Regards
> > Yuyang Yang
> > 
> > 
> > 
> > 
> > 
> > 发 件人: Saravanakumar Arumugam [mailto:sarumuga@xxxxxxxxxx]and share what's the output?
> > 发送时间: Thursday, May 19, 2016 1:59 PM
> > 收件人: vyyy杨雨阳 <yuyangyang@xxxxxxxxx><mailto:yuyangyang@xxxxxxxxx>;
> > Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx>
> > 主题: Re:  答复: geo-replication status partial faulty
> > 
> > Hi,
> > 
> > There seems to be some issue in glusterfs01.sh3.ctripcorp.com slave node.
> > Can you share the complete logs ?
> > 
> > You can increase verbosity of debug messages like this:
> > gluster volume geo-replication <master volume> <slave host>::<slave
> > volume> config log-level DEBUG
> > 
> > 
> > Also, check  /root/.ssh/authorized_keys in
> > glusterfs01.sh3.ctripcorp.com It should have entries in
> > /var/lib/glusterd/geo-replication/common_secret.pem.pub (present in
> > master node).
> > 
> > Have a look at this one for example:
> > https://www.gluster.org/pipermail/gluster-users/2015-August/023174.htm
> > l
> > 
> > Thanks,
> > Saravana
> > On 05/19/2016 07:53 AM, vyyy杨雨阳 wrote:
> > Hello,
> > 
> > I have tried to config a geo-replication volume , all the master nodes
> > configuration are the same, When I start this volume, the status shows
> > partial faulty as following:
> > 
> > gluster volume geo-replication filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave status
> > 
> > MASTER NODE      MASTER VOL    MASTER BRICK          SLAVE
> > STATUS     CHECKPOINT STATUS
> > CRAWL STATUS
> > -------------------------------------------------------------------------------------------------------------------------------------------------
> > SVR8048HW2285    filews        /export/sdb/filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SVR8050HW2285    filews        /export/sdb/filews
> > glusterfs03.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > N/A
> > SVR8047HW2285    filews        /export/sdb/filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave    Active     N/A
> > Hybrid Crawl
> > SVR8049HW2285    filews        /export/sdb/filews
> > glusterfs05.sh3.ctripcorp.com::filews_slave    Active     N/A
> > Hybrid Crawl
> > SH02SVR5951      filews        /export/sdb/brick1
> > glusterfs06.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > N/A
> > SH02SVR5953      filews        /export/sdb/brick1
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SVR6995HW2285    filews        /export/sdb/filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SH02SVR5954      filews        /export/sdb/brick1
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SVR6994HW2285    filews        /export/sdb/filews
> > glusterfs02.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > N/A
> > SVR6993HW2285    filews        /export/sdb/filews
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SH02SVR5952      filews        /export/sdb/brick1
> > glusterfs01.sh3.ctripcorp.com::filews_slave    faulty     N/A
> > N/A
> > SVR6996HW2285    filews        /export/sdb/filews
> > glusterfs04.sh3.ctripcorp.com::filews_slave    Passive    N/A
> > N/A
> > 
> > On the faulty node, log file /var/log/glusterfs/geo-replication/filews
> > shows
> > worker(/export/sdb/filews) died before establishing connection
> > 
> > [2016-05-18 16:55:46.402622] I [monitor(monitor):215:monitor] Monitor:
> > ------------------------------------------------------------
> > [2016-05-18 16:55:46.402930] I [monitor(monitor):216:monitor] Monitor:
> > starting gsyncd worker
> > [2016-05-18 16:55:46.517460] I [changelogagent(agent):72:__init__]
> > ChangelogAgent: Agent listining...
> > [2016-05-18 16:55:46.518066] I [repce(agent):92:service_loop] RepceServer:
> > terminating on reaching EOF.
> > [2016-05-18 16:55:46.518279] I [syncdutils(agent):214:finalize] <top>:
> > exiting.
> > [2016-05-18 16:55:46.518194] I [monitor(monitor):267:monitor] Monitor:
> > worker(/export/sdb/filews) died before establishing connection
> > [2016-05-18 16:55:56.697036] I [monitor(monitor):215:monitor] Monitor:
> > ------------------------------------------------------------
> > 
> > Any advice and suggestions will be greatly appreciated.
> > 
> > 
> > 
> > 
> > 
> > Best Regards
> >        Yuyang Yang
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > _______________________________________________
> > 
> > Gluster-users mailing list
> > 
> > Gluster-users@xxxxxxxxxxx<mailto:Gluster-users@xxxxxxxxxxx>
> > 
> > http://www.gluster.org/mailman/listinfo/gluster-users
> > 
> > 
> > 
> 
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux