Less than smooth upgrade experience from RHEL5.2->5.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'll just describe my upgrade process today. Cluster is back to a
quorate and operational status, but I don't fully understand what
happened and any input on what to do differently next time would be nice.

This is a two-node cluster running qdisk (so 3 total votes), resources
are mysql and haproxy with SAN backed storage. Both nodes mount /var/www
with GFS on a SAN multipathed device.

1. Migrated all services to node B
2. Upgraded node A with yum
3. Rebooted node A
4. Node A rejoins cluster, and takes ownership of resource with
failoverdomain priority
5. I notice /var/www is not mounted on node A
6. The errormessage is descriptive enough so I remove mountoptions until
I can mount /var/www (remove noatime, noquota from fstab)
7. Migrated remaining service to node A
8. Upgraded node B with yum
9. Rebooted node B
10. When node B shuts down, node A instantly claims quorum lost and
dissolves the cluster
11. Upon rebooting, node B hangs as the cluster is inquorate
12. Eventually rebooting both nodes re-establishes quorum and cluster
services come up


The messages on node A from the point where cluster quorum was dissolved
say :

Jan 27 14:57:57 nodeb qdiskd[3806]: <info> Node 1 shutdown
Jan 27 14:58:03 nodeb clurgmgrd[4465]: <emerg> #1: Quorum Dissolved
Jan 27 14:58:03 nodeb kernel: dlm: closing connection to node 1
Jan 27 14:58:03 nodeb openais[3755]: [CMAN ] lost contact with quorum
device
Jan 27 14:58:03 nodeb openais[3755]: [CMAN ] quorum lost, blocking activity
Jan 27 14:58:03 nodeb ccsd[3681]: Cluster is not quorate.  Refusing
connection.
Jan 27 14:58:03 nodeb ccsd[3681]: Error while processing connect:
Connection refused
Jan 27 14:58:03 nodeb ccsd[3681]: Invalid descriptor specified (-111).
Jan 27 14:58:03 nodeb ccsd[3681]: Someone may be attempting something evil.


I am still scratching my head over why quorum was dissolved over booting
node B.

Regards
-- 
Denis Braekhus
Team Lead Managed Services
Redpill Linpro AS - Changing the game

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux