On 02/23/2014 12:59 PM, cluster lab wrote: > > > > On Sat, Feb 22, 2014 at 2:28 PM, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx > <mailto:fdinitto@xxxxxxxxxx>> wrote: > > On 02/22/2014 11:10 AM, cluster lab wrote: > > hi, > > > > At the middle of cluster activity i received this messages: (cluster > > is 3 node with SAN ... GFS2 filesystem) > > OS? version of the packages? cluster.conf > > > OS: SL (Scientific Linux 6), > > Packages: > kernel-2.6.32-71.29.1.el6.x86_64 > rgmanager-3.0.12.1-12.el6.x86_64 > cman-3.0.12-23.el6.x86_64 > corosynclib-1.2.3-21.el6.x86_64 > corosync-1.2.3-21.el6.x86_64 > > Cluster.conf: > > <?xml version="1.0"?> > <cluster config_version="224" name="USBackCluster"> > <fence_daemon clean_start="0" post_fail_delay="10" > post_join_delay="3"/> > <clusternodes> > <clusternode name="USBack-prox1" nodeid="1" votes="1"> > <fence> > <method name="ilo"> > <device name="USBack-prox1-ilo"/> > </method> > </fence> > </clusternode> > <clusternode name="USBack-prox2" nodeid="2" votes="1"> > <fence> > <method name="ilo"> > <device name="USBack-prox2-ilo"/> > </method> > </fence> > </clusternode> > <clusternode name="USBack-prox3" nodeid="3" votes="1"> > <fence> > <method name="ilo"> > <device name="USBack-prox3-ilo"/> > </method> > </fence> > </clusternode> > </clusternodes> > <cman/> > <fencedevices> > ... fence config ... > </fencedevices> > <rm> > <failoverdomains> > <failoverdomain name="VMS-Area" nofailback="0" > ordered="0" restricted="0"> > <failoverdomainnode name="USBack-prox1" > priority="1"/> > <failoverdomainnode name="USBack-prox2" > priority="1"/> > <failoverdomainnode name="USBack-prox3" > priority="1"/> > </failoverdomain> > </failoverdomains> > <resources> > .... > > > > > > > > log messages on USBAck-prox2: > > > > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [QUORUM] Members[2]: 2 3 > > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [TOTEM ] A processor > > joined or left the membership and a new membership was formed. > > Feb 21 13:06:41 USBack-prox2 rgmanager[4130]: State change: > USBack-prox1 DOWN > > Feb 21 13:06:41 USBack-prox2 kernel: dlm: closing connection to node 1 > > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] downlist received > > left_list: 1 > > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] downlist received > > left_list: 1 > > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [CPG ] chosen downlist > > from node r(0) ip(--.--.--.22) > > Feb 21 13:06:41 USBack-prox2 corosync[3911]: [MAIN ] Completed service > > synchronization, ready to provide service. > > Feb 21 13:06:41 USBack-prox2 kernel: GFS2: > > fsid=USBackCluster:VMStorage1.0: jid=1: Trying to acquire journal > > lock... > > Feb 21 13:06:41 USBack-prox2 kernel: GFS2: > > fsid=USBackCluster:VMStorage2.0: jid=1: Trying to acquire journal > > lock... > > Feb 21 13:06:51 USBack-prox2 fenced[3957]: fencing node USBack-prox1 > > Feb 21 13:06:52 USBack-prox2 fenced[3957]: fence USBack-prox1 dev 0.0 > > agent fence_ipmilan result: error from agent > > Feb 21 13:06:52 USBack-prox2 fenced[3957]: fence USBack-prox1 failed > > Feb 21 13:06:54 USBack-prox2 kernel: dlm: connect from non cluster > node > > Feb 21 13:06:54 USBack-prox2 kernel: dlm: connect from non cluster > node > > ^^^ good hint here. something is off. > > > ? It means that there is something in that network that tries to connect to the cluster node, without being a cluster node. Fabio > > > > Fabio > > > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [TOTEM ] A processor > > joined or left the membership and a new membership was formed. > > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [QUORUM] Members[3]: > 1 2 3 > > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [QUORUM] Members[3]: > 1 2 3 > > Feb 21 13:06:55 USBack-prox2 rgmanager[4130]: State change: > USBack-prox1 UP > > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received > > left_list: 2 > > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received > > left_list: 0 > > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] downlist received > > left_list: 0 > > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [CPG ] chosen downlist > > from node r(0) ip(--.--.--.21) > > Feb 21 13:06:55 USBack-prox2 corosync[3911]: [MAIN ] Completed service > > synchronization, ready to provide service. > > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined > > error 12 handle 3a95f87400000000 protocol > > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined > > error 12 handle 1e7ff52100000001 start > > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined > > error 12 handle 22221a7000000002 start > > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined > > error 12 handle 419ac24100000003 start > > Feb 21 13:06:55 USBack-prox2 gfs_controld[4029]: cpg_mcast_joined > > error 12 handle 3804823e00000004 start > > > > > > ------------------------------------------------- > > Then GFS2 generates error logs (Activities blocked). > > > > Logs of cisco switch (Time is UTC): > > > > Feb 21 09:37:02.375: %LINEPROTO-5-UPDOWN: Line protocol on Interface > > GigabitEthernet0/11, changed state to down > > Feb 21 09:37:02.459: %LINEPROTO-5-UPDOWN: Line protocol on Interface > > GigabitEthernet0/4, changed state to down > > Feb 21 09:37:03.382: %LINK-3-UPDOWN: Interface GigabitEthernet0/11, > > changed state to down > > Feb 21 09:37:03.541: %LINK-3-UPDOWN: Interface GigabitEthernet0/4, > > changed state to down > > Feb 21 09:37:07.283: %LINK-3-UPDOWN: Interface GigabitEthernet0/11, > > changed state to up > > Feb 21 09:37:07.350: %LINK-3-UPDOWN: Interface GigabitEthernet0/4, > > changed state to up > > Feb 21 09:37:08.289: %LINEPROTO-5-UPDOWN: Line protocol on Interface > > GigabitEthernet0/11, changed state to up > > Feb 21 09:37:09.472: %LINEPROTO-5-UPDOWN: Line protocol on Interface > > GigabitEthernet0/4, changed state to up > > Feb 21 09:40:20.045: %LINEPROTO-5-UPDOWN: Line protocol on Interface > > GigabitEthernet0/11, changed state to down > > Feb 21 09:40:21.043: %LINK-3-UPDOWN: Interface GigabitEthernet0/11, > > changed state to down > > Feb 21 09:40:23.401: %LINK-3-UPDOWN: Interface GigabitEthernet0/11, > > changed state to up > > _______________________________________________ > > discuss mailing list > > discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx> > > http://lists.corosync.org/mailman/listinfo/discuss > > > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx> > http://lists.corosync.org/mailman/listinfo/discuss > > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss