Re: [Problem] Corosync cannot reconstitute a cluster.

renayama19661014@xxxxxxxxx · Thu, 13 Jun 2013 10:39:29 +0900 (JST)

Hi Honza,

I carried out the test that you showed.
If a test has a problem, please point it out.

(Test1)
- Block communication on local nodes via iptables (so drop all UDP
traffic, something like "iptables -A INPUT ! -i lo -p udp -j DROP &&
iptables -A OUTPUT ! -o lo -p udp -j DROP") - and then remove this
rules, does corosync create membership correctly?

(Result1 - OK : corosync create membership correctly)
 * node1
[root@bl460g6a ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined
[root@bl460g6a ~]# iptables -A INPUT ! -i lo -p udp -j DROP && iptables -A OUTPUT ! -o lo -p udp -j DROP
[root@bl460g6a ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined -----> Lone state.
[root@bl460g6a ~]# iptables -F
[root@bl460g6a ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined -----> Reconfiguration state.
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined

 * node2
[root@bl460g6b ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined
[root@bl460g6b ~]# iptables -A INPUT ! -i lo -p udp -j DROP && iptables -A OUTPUT ! -o lo -p udp -j DROP
[root@bl460g6b ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined -----> Lone state.
[root@bl460g6b ~]# iptables -F
[root@bl460g6b ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined -----> Reconfiguration state.
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined

 * node3 
[root@bl460g6c ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined
[root@bl460g6c ~]# iptables -A INPUT ! -i lo -p udp -j DROP && iptables -A OUTPUT ! -o lo -p udp -j DROP
[root@bl460g6c ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined -----> Lone state.
[root@bl460g6c ~]# iptables -F
[root@bl460g6c ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined -----> Reconfiguration state.
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined

(Test2)
- Unplug cables (please make sure to NOT configure network via
networkmanager. Networkmanager does ifdown and corosync doesn't work
correctly with ifdown). Then plug cables again. Is membership
reconstructed correctly?

(Result2  - OK : corosync create membership correctly))
 * node1
[root@bl460g6a ~]# chkconfig --list NetworkManager
NetworkManager  0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@bl460g6a ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined
[root@bl460g6a ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined
[root@bl460g6a ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined -----> Lone state.
[root@bl460g6a ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined -----> Reconfiguration state.
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined

 * node2
[root@bl460g6b ~]# chkconfig --list NetworkManager
NetworkManager  0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@bl460g6b ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined
[root@bl460g6b ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined -----> Lone state.
[root@bl460g6b ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined -----> Reconfiguration state.
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined

 * node3
[root@bl460g6c ~]# chkconfig --list NetworkManager
NetworkManager  0:off   1:off   2:off   3:off   4:off   5:off   6:off
[root@bl460g6c ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined
[root@bl460g6c ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined -----> Lone state.
[root@bl460g6c ~]# corosync-cmapctl  | grep joined
runtime.totem.pg.mrp.srp.members.157657280.status (str) = joined -----> Reconfiguration state.
runtime.totem.pg.mrp.srp.members.174434496.status (str) = joined
runtime.totem.pg.mrp.srp.members.191211712.status (str) = joined

> If result of both of test cases is correct membership then problem is in
> switch. If so, you can try ether corosync UDPU mode (it's slightly
> slower, but as long as GFS is not used, it's acceptable, especially for
> 3 nodes environment) or you can try change switch configuration.

Really? Problem is in switch?

I think that the phenomenon is generated depending on a way of the cutting of the network of corosync.
I think that it is not a problem of SW.

The cutting of the network which I reported is as follows.
 * x mark is cuts.

       -------------------------------
      |                      SW1                     |
       -------------------------------
         |                  |                         |
        X                  |                         |
         |                  |                         |
  ------------   ------------   ------------
 | node1          |  | node2          |  | node3         |
  ------------   ------------   ------------
         |                  |                         |
         |                 X                         |
         |                  |                         |
       -------------------------------
      |                      SW2                     |
       -------------------------------

 * In SW1, node3 can communicate with node2.
 * In SW2, node3 can communicate with node1.

A control message of corosync goes each other, and, in the case of this trouble, does a problem not happen?
Does it not become the factor that cannot constitute a cluster?

Best Regards,
Hideo Yamauchi.

--- On Thu, 2013/6/13, renayama19661014@xxxxxxxxx <renayama19661014@xxxxxxxxx> wrote:

> Hi Honza,
> 
> Thank you for comment.
> I try the test that you suggested and report a result.
> 
> Many Thanks!
> Hideo Yamauchi.
> 
> --- On Wed, 2013/6/12, Jan Friesse <jfriesse@xxxxxxxxxx> wrote:
> 
> > Hideo,
> > can you please try to test following things:
> > 
> > - Block communication on local nodes via iptables (so drop all UDP
> > traffic, something like "iptables -A INPUT ! -i lo -p udp -j DROP &&
> > iptables -A OUTPUT ! -o lo -p udp -j DROP") - and then remove this
> > rules, does corosync create membership correctly?
> > - Unplug cables (please make sure to NOT configure network via
> > networkmanager. Networkmanager does ifdown and corosync doesn't work
> > correctly with ifdown). Then plug cables again. Is membership
> > reconstructed correctly?
> > 
> > If result of both of test cases is correct membership then problem is in
> > switch. If so, you can try ether corosync UDPU mode (it's slightly
> > slower, but as long as GFS is not used, it's acceptable, especially for
> > 3 nodes environment) or you can try change switch configuration.
> > 
> > Regards,
> >   Honza
> > 
> > renayama19661014@xxxxxxxxx napsal(a):
> > > Hi Honza,
> > > 
> > > Thank you for comments.
> > > 
> > >> can you please tell me exact reproducer for physical hw? (because brctl
> > >> delif is I believe not valid in hw at all).
> > > 
> > > It is the next environment that I reported a problem in the second in physical  environment.
> > > 
> > > -------------------------
> > > Enclosure               : BladeSystem c7000 Enclosure
> > > node1, node2, node3 : HP ProLiant BL460c G6(CPU:Xeon E5540,Mem:16G) --- Blade
> > >                                  NIC:Flex-10 Embedded Ethernet x 1(2Port)
> > >                                  NIC:NC325m Quad Port 1Gb NIC for c-Class BladeSystem(4Port)
> > > SW                        : GbE2c Ethernet Blade Switch x 6
> > > -------------------------
> > > 
> > > In addition, I carried out the cutting of the interface via a switch.
> > >  * In the second report, I did not execute the brctl command.
> > > 
> > > Is more detailed HW information necessary?
> > > If there is necessary information, I send it.
> > > 
> > > Best Regards,
> > > Hideo Yamauchi.
> > > 
> > > 
> > > --- On Wed, 2013/6/12, Jan Friesse <jfriesse@xxxxxxxxxx> wrote:
> > > 
> > >> Hideo,
> > >> can you please tell me exact reproducer for physical hw? (because brctl
> > >> delif is I believe not valid in hw at all).
> > >>
> > >> Thanks,
> > >>   Honza
> > >>
> > >> renayama19661014@xxxxxxxxx napsal(a):
> > >>> Hi Fabio,
> > >>>
> > >>> Thank you for comment.
> > >>>
> > >>>> I'll let Honza look at it, I don't have enough physical hardware to
> > >>>> reproduce.
> > >>>
> > >>> All right.
> > >>>
> > >>> Many Thanks!
> > >>> Hideo Yamauchi.
> > >>>
> > >>>
> > >>> --- On Tue, 2013/6/11, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> wrote:
> > >>>
> > >>>> Hi Yamauchi-san,
> > >>>>
> > >>>> I'll let Honza look at it, I don't have enough physical hardware to
> > >>>> reproduce.
> > >>>>
> > >>>> Fabio
> > >>>>
> > >>>> On 06/11/2013 01:15 AM, renayama19661014@xxxxxxxxx wrote:
> > >>>>> Hi Fabio,
> > >>>>>
> > >>>>> Thank you for comments.
> > >>>>>
> > >>>>> We confirmed this problem in the physical environment.
> > >>>>> The communication of corosync lets eth1,eth2 go through.
> > >>>>>
> > >>>>> -------------------------------------------------------
> > >>>>> [root@bl460g6a ~]# ip addr show
> > >>>>> (snip)
> > >>>>> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
> > >>>>>       link/ether f4:ce:46:b3:fe:3c brd ff:ff:ff:ff:ff:ff
> > >>>>>       inet 192.168.101.9/24 brd 192.168.101.255 scope global eth1
> > >>>>>       inet6 fe80::f6ce:46ff:feb3:fe3c/64 scope link 
> > >>>>>          valid_lft forever preferred_lft forever
> > >>>>> 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
> > >>>>>       link/ether 18:a9:05:78:6c:f0 brd ff:ff:ff:ff:ff:ff
> > >>>>>       inet 192.168.102.9/24 brd 192.168.102.255 scope global eth2
> > >>>>>       inet6 fe80::1aa9:5ff:fe78:6cf0/64 scope link 
> > >>>>>          valid_lft forever preferred_lft forever
> > >>>>> (snip)
> > >>>>> 8: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
> > >>>>>       link/ether 52:54:00:7f:f3:0a brd ff:ff:ff:ff:ff:ff
> > >>>>>       inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
> > >>>>> 9: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 500
> > >>>>>       link/ether 52:54:00:7f:f3:0a brd ff:ff:ff:ff:ff:ff
> > >>>>> -----------------------------------------------
> > >>>>>
> > >>>>> I think that it is not a virtual environmental problem.
> > >>>>>
> > >>>>> I attach the log that I confirmed just to make sure in three Blade.(RHEL6.4)
> > >>>>> * I performed the interception of the communication with a network switch.
> > >>>>>
> > >>>>> The phenomenon is similar, and, as for one node, a loop does an OPERATIONAL state, and two other nodes do not change in an OPERATIONAL state.
> > >>>>>
> > >>>>> After all is the problem same as the bug that you taught?
> > >>>>>> Check this thread as reference:
> > >>>>>> http://lists.linuxfoundation.org/pipermail/openais/2013-April/016792.html
> > >>>>>
> > >>>>>
> > >>>>> Best Regards,
> > >>>>> Hideo Yamauchi.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --- On Fri, 2013/5/31, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> wrote:
> > >>>>>
> > >>>>>> On 5/31/2013 7:12 AM, renayama19661014@xxxxxxxxx wrote:
> > >>>>>>> Hi All,
> > >>>>>>>
> > >>>>>>> We discovered the problem of the network of the corosync communication.
> > >>>>>>>
> > >>>>>>> We composed a cluster of three nodes on KVM in corosync.
> > >>>>>>>
> > >>>>>>> Step 1) Start corosync service in all nodes. 
> > >>>>>>>
> > >>>>>>> Step 2) Confirm that a cluster is comprised of all nodes definitely and became the OPERATIONAL state.
> > >>>>>>>
> > >>>>>>> Step 3) Cut off the network of node1(rh64-coro1) and node2(rh64-coro2) from a host of KVM.
> > >>>>>>>
> > >>>>>>>           [root@kvm-host ~]# brctl delif virbr3 vnet5;brctl delif virbr2 vnet1
> > >>>>>>>
> > >>>>>>> Step 4) Because a problem occurred, we stop all nodes.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> The problem occurs at the time of step 3.
> > >>>>>>>
> > >>>>>>> One node(rh64-coro1) continues moving a state after becoming the OPERATIONAL state.
> > >>>>>>>
> > >>>>>>> Two nodes(rh64-coro2 and rh64-coro3) continue changing in a state.
> > >>>>>>> It seems to never change in an OPERATIONAL state while the first node operates.
> > >>>>>>>
> > >>>>>>> This means that two nodes(rh64-coro2 and rh64-coro3) cannot complete cluster constitution.
> > >>>>>>> When this network trouble happens, by the setting that corosync combined with Pacemaker, corosync cannot notify Pacemaker of the constitution change of the cluster.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Question 1) Are there any parameters to solve this problem in corosync.conf?
> > >>>>>>>     * We bundle up an interface(Bonding) and think that it can be settled by appointing "rrp_mode:none", but do not want to appoint "rrp_mode:none".
> > >>>>>>>
> > >>>>>>> Question 2) Is this a bug? Or is it specifications of the communication of corosync?
> > >>>>>>
> > >>>>>> We already checked this specific test, and it appears to be a bug in
> > >>>>>> the kernel bridge code when handling multicast traffic (groups are not
> > >>>>>> joined correctly and traffic is not forwarded).
> > >>>>>>
> > >>>>>> Check this thread as reference:
> > >>>>>> http://lists.linuxfoundation.org/pipermail/openais/2013-April/016792.html
> > >>>>>>
> > >>>>>> Thanks
> > >>>>>> Fabio
> > >>>>>>
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> discuss mailing list
> > >>>>>> discuss@xxxxxxxxxxxx
> > >>>>>> http://lists.corosync.org/mailman/listinfo/discuss
> > >>>>>>
> > >>>>
> > >>>>
> > >>>
> > >>> _______________________________________________
> > >>> discuss mailing list
> > >>> discuss@xxxxxxxxxxxx
> > >>> http://lists.corosync.org/mailman/listinfo/discuss
> > >>
> > >>
> > 
> > 
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss