Barry Brimer wrote: > > > On Mon, 4 Dec 2006, Robert Peterson wrote: > >> Barry Brimer wrote: >>> This is a repeat of the post I made a few minutes ago. I thought >>> adding a >>> subject would be helpful. >>> >>> >>> I have a 2 node cluster for a shared GFS filesystem. One of the >>> nodes fenced >>> the other, and the node that got fenced is no longer able to >>> communicate with >>> the cluster. >>> >>> While booting the problem node, I receive the following error message: >>> Setting up Logical Volume Management: Locking inactive: ignoring >>> clustered >>> volume group vg00 >>> >>> I have compared /etc/lvm/lvm.conf files on both nodes. They are >>> identical. The >>> disk (/dev/sda1) is listed when typing "fdisk -l" >>> >>> There are no iptables firewalls active (although >>> /etc/sysconfig/iptables exists, >>> iptables is chkconfig'd off). I have written a simple iptables >>> logging rule >>> (iptables -I INPUT -s <problem node> -j LOG) on the working node to >>> verify that >>> packets are reaching the working node, but no messages are being >>> logged in >>> /var/log/messages on the working node that acknowledge any cluster >>> activity >>> from the problem node. >>> >>> Both machines have the same RH packages installed and are mostly up >>> to date, >>> they are missing the same packages, none of which involve the kernel, >>> RHCS or >>> GFS. >>> >>> When I boot the problem node, it successfully starts ccsd, but it >>> fails after a >>> while on cman and fails after a while on fenced. I have given the clvmd >>> process an hour, and it still will not start. >>> >>> vgchange -ay on the problem node returns: >>> >>> # vgchange -ay >>> connect() failed on local socket: Connection refused >>> Locking type 2 initialisation failed. >>> >>> I have the contents of /var/log/messages on the working node and the >>> problem >>> node at the time of the fence, if that would be helpful. >>> >>> Any help is greatly appreciated. >>> >>> Thanks, >>> Barry >>> >> Hi Barry, >> >> Well, vgchange and other lvm functions won't work on the clustered volume >> unless clvmd is running, and clvmd won't run properly until the node >> is talking >> happily through the cluster infrastructure. So as I see it, your >> problem is that >> cman is not starting properly. Unfortunately, you haven't told us >> much about >> the system to determine why. There can be many reasons. > > Agreed. Although it did not seem relevant at the time of the post, > there were network outages around the time of the failure. What happens > now is that on the problem node, ccsd starts, but when starting cman it > sends membership requests but they are never received acknowledged the > working node. Again, I see packets received in the /var/log/messages on > the working node on UDP 6809 from the problem node, but watching > /var/log/messages on the working node, cman never acknowledges them. > > The problem node had this in its /var/log/messages at the time of the > problem: > > Dec 1 14:29:38 server1 kernel: CMAN: Being told to leave the cluster by > node 1 > Dec 1 14:29:38 server1 kernel: CMAN: we are leaving the cluster. If you're running the cman from RHEL4 Update 3 then there's a bug in there you might be hitting. You'll need to upgrade all the nodes in the cluster to get rid of it. I can't tell for sure if it is that problem you're having without seeing more kernel messages though. -- patrick -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster