Corosync 1.4.6 locks up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Honza

I decided for 1.4.6 as I already had the .deb packages I made for the
ipv6 tests.

All works well except that when I kill the server running as DC which is
also running the active HA servers.

The setup is two real servers running proxmox each with one KVM virutal
machine.

Each VM runs corosync 1.4.6 and a virtual server from Hetzner runs the
same thus giving me a 3 node corosync/pacemaker cluster. The VS from
Hetzner never runs any real service - it's just for quorum purposes. The
two KVMs run ip changeover, drbd and mysql.

I have done many tests with controlled handover - i.e. 'reboot' - all
works very well.

Most time when I kill a node this works too - i.e. 'stop' on proxmox or
the Hetzner control panel.

I have a long token time (20 seconds) to allow for short network outages
so that we don't reconfigure for every little glitch.

Not fully reproducible but twice now I have killed the server which was
both DC and had the active services running and hit a problem. Most
times it works.

The problem is that the remaining two nodes do not see the killed node
go offline (crm status shows everything online and quorum DC on the dead
node). Nothing works anymore e.g. crm resource cleanup xyz just hangs.

The corosync log however shows the old DC disappearing and the new DC
being negotiated correctly (to my eyes at least). But this doesn't
appear to have any effect.

A final part of the bug is that corosync refuses to shutdown during the
reboot process - only a hard reboot works.

This is very similar to problems I've seen on live systems before we
stopped using wan links. I would love to get a fix for this as it
completely kills HA. When we are in this state nothing works until the
ops hard reboot all nodes.

Question: what exactly do you need from me the next time this happens?

All the best

Allan


On 21/08/13 14:20, Allan Latham wrote:
> Hi Honza
> 
> I'd like to compile the latest (and best) version which could work as a
> drop in replacement for what is in Debian Wheezy:
> 
> root@h89 /root # corosync -v
> Corosync Cluster Engine, version '1.4.2'
> Copyright (c) 2006-2009 Red Hat, Inc.
> 
> root@h89 /root # dpkg -l |grep corosync
> ii  corosync                           1.4.2-3
> amd64        Standards-based cluster framework (daemon and modules)
> 
> Which version do you recommend?
> 
> Or is there a compatible .deb somewhere?
> 
> In particular the bug with pointopoint and udpu is biting me!
> 
> All the best
> 
> Allan
> 
> 


_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss




[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux