2 nodes cluster (virtfed and virtfedbis their names) with F11 x86_64 up2date as of today and without qdisk
cman-3.0.2-1.fc11.x86_64
openais-1.0.1-1.fc11.x86_64
corosync-1.0.0-1.fc11.x86_64
and kernel 2.6.30.8-64.fc11.x86_64
I was in a situation where both nodes up, after virtfedbis hust restarted and starting a service
Inside one of its resources there is a loop where it tests availability of a file and so it was in starting of this service, but infra ws up, as of this messages:
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] CLM CONFIGURATION CHANGE
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] New Configuration:
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] #011r(0) ip(192.168.16.101)
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] Members Left:
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] #011r(0) ip(192.168.16.102)
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] Members Joined:
Oct 5 11:44:39 virtfed corosync[4684]: [QUORUM] This node is within the primary component and will provide service.
Oct 5 11:44:39 virtfed corosync[4684]: [QUORUM] Members[1]:
Oct 5 11:44:39 virtfed corosync[4684]: [QUORUM] 1
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] CLM CONFIGURATION CHANGE
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] New Configuration:
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] #011r(0) ip(192.168.16.101)
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] Members Left:
Oct 5 11:44:39 virtfed corosync[4684]: [CLM ] Members Joined:
Oct 5 11:44:39 virtfed corosync[4684]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct 5 11:44:39 virtfed kernel: dlm: closing connection to node 2
Oct 5 11:44:39 virtfed corosync[4684]: [MAIN ] Completed service synchronization, ready to provide service.
So now they are at this condition, reported by virtfedbis
[root@virtfedbis ~]# clustat
Cluster Status for kvm @ Mon Oct 5 11:49:27 2009
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
kvm1 1 Online, rgmanager
kvm2 2 Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
service:DRBDNODE1 kvm1 started
service:DRBDNODE2 kvm2 starting
I realize that I forgot a thing so that after 10 attempts DRBDNODE2 service would not come up and so I decide to put
virtfedbis in single user mode, so that I run on it
shutdown 0
I would expect virtfedbis to leave cleanly the cluster, instead it is fenced and rebooted (via fence_ilo agent)
On virtfed these are the messages:
Oct 5 11:49:49 virtfed corosync[4684]: [TOTEM ] A processor failed, forming new configuration.
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] CLM CONFIGURATION CHANGE
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] New Configuration:
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] #011r(0) ip(192.168.16.101)
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] Members Left:
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] #011r(0) ip(192.168.16.102)
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] Members Joined:
Oct 5 11:49:54 virtfed corosync[4684]: [QUORUM] This node is within the primary component and will provide service.
Oct 5 11:49:54 virtfed corosync[4684]: [QUORUM] Members[1]:
Oct 5 11:49:54 virtfed corosync[4684]: [QUORUM] 1
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] CLM CONFIGURATION CHANGE
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] New Configuration:
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] #011r(0) ip(192.168.16.101)
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] Members Left:
Oct 5 11:49:54 virtfed corosync[4684]: [CLM ] Members Joined:
Oct 5 11:49:54 virtfed corosync[4684]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct 5 11:49:54 virtfed corosync[4684]: [MAIN ] Completed service synchronization, ready to provide service.
Oct 5 11:49:54 virtfed kernel: dlm: closing connection to node 2
Oct 5 11:49:54 virtfed fenced[4742]: fencing node kvm2
Oct 5 11:49:54 virtfed rgmanager[5496]: State change: kvm2 DOWN
Oct 5 11:50:26 virtfed fenced[4742]: fence kvm2 success
What I find on virtfedbis after restart in /var/log/cluster directory is this:
corosync.log
Oct 05 11:49:49 corosync [TOTEM ] A processor failed, forming new configuration.
Oct 05 11:49:49 corosync [TOTEM ] The network interface is down.
Oct 05 11:49:54 corosync [CLM ] CLM CONFIGURATION CHANGE
Oct 05 11:49:54 corosync [CLM ] New Configuration:
Oct 05 11:49:54 corosync [CLM ] r(0) ip(127.0.0.1)
Oct 05 11:49:54 corosync [CLM ] Members Left:
Oct 05 11:49:54 corosync [CLM ] r(0) ip(192.168.16.102)
Oct 05 11:49:54 corosync [CLM ] Members Joined:
Oct 05 11:49:54 corosync [QUORUM] This node is within the primary component and will provide service.
Oct 05 11:49:54 corosync [QUORUM] Members[1]:
Oct 05 11:49:54 corosync [QUORUM] 1
Oct 05 11:49:54 corosync [CLM ] CLM CONFIGURATION CHANGE
Oct 05 11:49:54 corosync [CLM ] New Configuration:
Oct 05 11:49:54 corosync [CLM ] r(0) ip(127.0.0.1)
Oct 05 11:49:54 corosync [CLM ] Members Left:
Oct 05 11:49:54 corosync [CLM ] Members Joined:
Oct 05 11:49:54 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Oct 05 11:49:54 corosync [CMAN ] Killing node kvm2 because it has rejoined the cluster with existing state
I think there is something wrong in this behaviour....
This is a test cluster so I have no qdisk .....
Is this the cause inherent with my config that has:
<cman expected_votes="1" two_node="1"/>
<fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="20"/>
In general, if I do a shutdown -r now an one of the two nodes I have not thsi kind of problems.....
Thanks for any insight,
Gianluca
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster