On 17/06/14 15:27, Schaefer, Micah wrote:
I am running Red Hat 6.4 with the HA/ load balancing packages from the
install DVD.
-bash-4.1$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.4 (Santiago)
-bash-4.1$ corosync -v
Corosync Cluster Engine, version '1.4.1'
Copyright (c) 2006-2009 Red Hat, Inc.
Thanks. 6.5 has better pause detection in it but I don't think that's
the issue here actually. It looks to me like some messages are getting
through but not others. So I'm back to seriously wondering if multicast
traffic is being forwarded correctly and reliably. Having a mix of
virtual and physical systems can cause these sorts of issues with real
and software switches being mixed. Though I haven't seen anything quite
as odd as this to be honest.
Can you try either UDPU (preferred) or broadcast transport please and
see if that helps or changes the symptoms at all? Broadcast could be
problematic itself with the real/virtual mix so UDPU will be a more
reliable option.
Annoyingly, you'll need to take down the whole cluster to do this, and add
<cman transport="udpu"/>
to /etc/cluster/cluster.conf on all nodes.
Chrissie
On 6/17/14, 8:41 AM, "Christine Caulfield" <ccaulfie@xxxxxxxxxx> wrote:
On 12/06/14 20:06, Digimer wrote:
Hrm, I'm not really sure that I am able to interpret this without making
guesses. I'm cc'ing one of the devs (who I hope will poke the right
person if he's not able to help at the moment). Lets see what he has to
say.
I am curious now, too. :)
On 12/06/14 03:02 PM, Schaefer, Micah wrote:
Node4 was fenced again, I was able to get some debug logs (below), a
new
message :
"Jun 12 14:01:56 corosync [TOTEM ] The token was lost in the
OPERATIONAL
state.³
Rest of corosync logs
http://pastebin.com/iYFbkbhb
Jun 12 14:44:49 corosync [TOTEM ] entering OPERATIONAL state.
Jun 12 14:44:49 corosync [TOTEM ] A processor joined or left the
membership and a new membership was formed.
Jun 12 14:44:49 corosync [TOTEM ] waiting_trans_ack changed to 0
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] entering GATHER state from 12.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 32947 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33016 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33086 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
flushing membership messages.
Jun 12 14:44:49 corosync [TOTEM ] Process pause detected for 33155 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33224 ms,
flushing membership messages.
Jun 12 14:44:50 corosync [TOTEM ] Process pause detected for 33225 ms,
flushing membership messages.
I'm concerned that the pause messages are repeating like that, it looks
like it might be a fixed bug. What version of corosync do you have?
Chrissie
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster