Hi, guys
We have problems with corosync 1.4.x (1.4.6 and 1.4.7).This is the scenario:
We have 3-node cluster: node-1 (10.108.2.3),node-2(10.108.2.4),node-3(10.108.2.5) in UDPU mode.
We shut down the interface used for cluster communication on one of the nodes by issuing `ifdown eth2` command on node-1.
runtime.totem.pg.mrp.srp.members.67267594.ip=r(0) ip(10.108.2.4)
runtime.totem.pg.mrp.srp.members.67267594.join_count=1
runtime.totem.pg.mrp.srp.members.67267594.status=joined
runtime.totem.pg.mrp.srp.members.84044810.ip=r(0) ip(10.108.2.5)
runtime.totem.pg.mrp.srp.members.84044810.join_count=2
runtime.totem.pg.mrp.srp.members.84044810.status=joined
runtime.totem.pg.mrp.srp.members.50490378.ip=r(0) ip(10.108.2.3)
runtime.totem.pg.mrp.srp.members.50490378.join_count=1
runtime.totem.pg.mrp.srp.members.50490378.status=left
runtime.totem.pg.mrp.srp.firewall_enabled_or_nic_failure=1
runtime.totem.pg.mrp.srp.members.50490378.ip=r(0) ip(10.108.2.3)
runtime.totem.pg.mrp.srp.members.50490378.join_count=1
runtime.totem.pg.mrp.srp.members.50490378.status=joined
runtime.totem.pg.mrp.srp.members.67267594.ip=r(0) ip(10.108.2.4)
runtime.totem.pg.mrp.srp.members.67267594.join_count=1
runtime.totem.pg.mrp.srp.members.67267594.status=joined
runtime.totem.pg.mrp.srp.members.84044810.ip=r(0) ip(10.108.2.5)
runtime.totem.pg.mrp.srp.members.84044810.join_count=1
runtime.totem.pg.mrp.srp.members.84044810.status=joined
runtime.blackbox.dump_flight_data=no
runtime.blackbox.dump_state=no
In node-1 logs I see the following:
2014-08-13T15:46:18.848234+01:00 warning: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
2014-08-13T15:46:18.866365+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.866799+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.866799+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.866799+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.866799+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.866799+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.935539+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.935932+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.935932+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.935932+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.935932+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.848234+01:00 warning: [MAIN ] Totem is unable to form a cluster because of an operating system or network fault. The most common cause of this message is that the local firewall is configured improperly.
2014-08-13T15:46:18.866365+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.866799+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.866799+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.866799+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.866799+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.866799+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.935539+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.935932+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.935932+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.935932+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
2014-08-13T15:46:18.935932+01:00 debug: [TOTEM ] sendmsg(mcast) failed (non-critical): Invalid argument (22)
I am pretty sure that this is the bug as corosync should detect ring failure and mark dead nodes as dead on both sides.
Can you help me fix it or provide with the clue where to look for the fix?
--
Yours Faithfully,
Vladimir Kuklin,
Fuel Library Tech Lead,
Mirantis, Inc.
+7 (495) 640-49-04
+7 (926) 702-39-68
Skype kuklinvv
45bk3, Vorontsovskaya Str.
Moscow, Russia,
www.mirantis.com
www.mirantis.ru
vkuklin@xxxxxxxxxxxx
Vladimir Kuklin,
Fuel Library Tech Lead,
Mirantis, Inc.
+7 (495) 640-49-04
+7 (926) 702-39-68
Skype kuklinvv
45bk3, Vorontsovskaya Str.
Moscow, Russia,
www.mirantis.com
www.mirantis.ru
vkuklin@xxxxxxxxxxxx
_______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss