Re: Strange corosync fail ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/23/2014 12:59 PM, cluster lab wrote:
> 
> 
> 
> On Sat, Feb 22, 2014 at 2:28 PM, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx
> <mailto:fdinitto@xxxxxxxxxx>> wrote:
> 
>     On 02/22/2014 11:10 AM, cluster lab wrote:
>     > hi,
>     >
>     > At the middle of cluster activity i received this messages: (cluster
>     > is 3 node with SAN ... GFS2 filesystem)
> 
>     OS? version of the packages? cluster.conf
> 
> 
> OS: SL (Scientific Linux 6),
> 
> Packages:
> kernel-2.6.32-71.29.1.el6.x86_64
> rgmanager-3.0.12.1-12.el6.x86_64
> cman-3.0.12-23.el6.x86_64
> corosynclib-1.2.3-21.el6.x86_64
> corosync-1.2.3-21.el6.x86_64
> 
> Cluster.conf:
> 
> <?xml version="1.0"?>
> <cluster config_version="224" name="USBackCluster">
>         <fence_daemon clean_start="0" post_fail_delay="10"
> post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="USBack-prox1" nodeid="1" votes="1">
>                         <fence>
>                                 <method name="ilo">
>                                         <device name="USBack-prox1-ilo"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="USBack-prox2" nodeid="2" votes="1">
>                         <fence>
>                                 <method name="ilo">
>                                         <device name="USBack-prox2-ilo"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="USBack-prox3" nodeid="3" votes="1">
>                         <fence>
>                                 <method name="ilo">
>                                         <device name="USBack-prox3-ilo"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman/>
>         <fencedevices>
>                 ... fence config ...
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="VMS-Area" nofailback="0"
> ordered="0" restricted="0">
>                                 <failoverdomainnode name="USBack-prox1"
> priority="1"/>
>                                 <failoverdomainnode name="USBack-prox2"
> priority="1"/>
>                                 <failoverdomainnode name="USBack-prox3"
> priority="1"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>     ....
> 
>  
> 
> 
>     >
>     > log messages on USBAck-prox2:
>     >
>     > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [QUORUM] Members[2]: 2 3
>     > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [TOTEM ] A processor
>     > joined or left the membership and a new membership was formed.
>     > Feb 21 13:06:41 USBack-prox2 rgmanager[4130]: State change:
>     USBack-prox1 DOWN
>     > Feb 21 13:06:41 USBack-prox2 kernel: dlm: closing connection to node 1
>     > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] downlist received
>     > left_list: 1
>     > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] downlist received
>     > left_list: 1
>     > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] chosen downlist
>     > from node r(0) ip(--.--.--.22)
>     > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [MAIN ] Completed service
>     > synchronization, ready to provide service.
>     > Feb 21 13:06:41 USBack-prox2 kernel: GFS2:
>     > fsid=USBackCluster:VMStorage1.0: jid=1: Trying to acquire journal
>     > lock...
>     > Feb 21 13:06:41 USBack-prox2 kernel: GFS2:
>     > fsid=USBackCluster:VMStorage2.0: jid=1: Trying to acquire journal
>     > lock...
>     > Feb 21 13:06:51 USBack-prox2 fenced[3957]: fencing node USBack-prox1
>     > Feb 21 13:06:52 USBack-prox2 fenced[3957]: fence USBack-prox1 dev 0.0
>     > agent fence_ipmilan result: error from agent
>     > Feb 21 13:06:52 USBack-prox2 fenced[3957]: fence USBack-prox1 failed
>     > Feb 21 13:06:54 USBack-prox2 kernel: dlm: connect from non cluster
>     node
>     > Feb 21 13:06:54 USBack-prox2 kernel: dlm: connect from non cluster
>     node
> 
>     ^^^ good hint here. something is off.
> 
> 
> ? 

It means that there is something in that network that tries to connect
to the cluster node, without being a cluster node.

Fabio

>  
> 
> 
>     Fabio
> 
>     > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [TOTEM ] A processor
>     > joined or left the membership and a new membership was formed.
>     > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [QUORUM] Members[3]:
>     1 2 3
>     > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [QUORUM] Members[3]:
>     1 2 3
>     > Feb 21 13:06:55 USBack-prox2 rgmanager[4130]: State change:
>     USBack-prox1 UP
>     > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received
>     > left_list: 2
>     > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received
>     > left_list: 0
>     > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received
>     > left_list: 0
>     > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] chosen downlist
>     > from node r(0) ip(--.--.--.21)
>     > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [MAIN ] Completed service
>     > synchronization, ready to provide service.
>     > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined
>     > error 12 handle 3a95f87400000000 protocol
>     > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined
>     > error 12 handle 1e7ff52100000001 start
>     > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined
>     > error 12 handle 22221a7000000002 start
>     > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined
>     > error 12 handle 419ac24100000003 start
>     > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined
>     > error 12 handle 3804823e00000004 start
>     >
>     >
>     > -------------------------------------------------
>     > Then GFS2 generates error logs (Activities blocked).
>     >
>     > Logs of cisco switch (Time is UTC):
>     >
>     > Feb 21 09:37:02.375: %LINEPROTO-5-UPDOWN: Line protocol on Interface
>     > GigabitEthernet0/11, changed state to down
>     > Feb 21 09:37:02.459: %LINEPROTO-5-UPDOWN: Line protocol on Interface
>     > GigabitEthernet0/4, changed state to down
>     > Feb 21 09:37:03.382: %LINK-3-UPDOWN: Interface GigabitEthernet0/11,
>     > changed state to down
>     > Feb 21 09:37:03.541: %LINK-3-UPDOWN: Interface GigabitEthernet0/4,
>     > changed state to down
>     > Feb 21 09:37:07.283: %LINK-3-UPDOWN: Interface GigabitEthernet0/11,
>     > changed state to up
>     > Feb 21 09:37:07.350: %LINK-3-UPDOWN: Interface GigabitEthernet0/4,
>     > changed state to up
>     > Feb 21 09:37:08.289: %LINEPROTO-5-UPDOWN: Line protocol on Interface
>     > GigabitEthernet0/11, changed state to up
>     > Feb 21 09:37:09.472: %LINEPROTO-5-UPDOWN: Line protocol on Interface
>     > GigabitEthernet0/4, changed state to up
>     > Feb 21 09:40:20.045: %LINEPROTO-5-UPDOWN: Line protocol on Interface
>     > GigabitEthernet0/11, changed state to down
>     > Feb 21 09:40:21.043: %LINK-3-UPDOWN: Interface GigabitEthernet0/11,
>     > changed state to down
>     > Feb 21 09:40:23.401: %LINK-3-UPDOWN: Interface GigabitEthernet0/11,
>     > changed state to up
>     > _______________________________________________
>     > discuss mailing list
>     > discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx>
>     > http://lists.corosync.org/mailman/listinfo/discuss
>     >
> 
>     _______________________________________________
>     discuss mailing list
>     discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx>
>     http://lists.corosync.org/mailman/listinfo/discuss
> 
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss




[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux