After some more estensive testing, the problem is not solved. I fence one guest node from the luci interface (or with xm destroy from a physical node, is the same). What I see on another node log is: Dec 14 09:59:28 c5g-thor openais[1741]: [TOTEM] The token was lost in the OPERATIONAL state. Dec 14 09:59:28 c5g-thor openais[1741]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). Dec 14 09:59:28 c5g-thor openais[1741]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Dec 14 09:59:28 c5g-thor openais[1741]: [TOTEM] entering GATHER state from 2. Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] entering GATHER state from 0. Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] entering GATHER state from 11. Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] entering GATHER state from 11. Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] Creating commit token because I am the rep. Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] Saving state aru 71 high seq received 71 Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] Storing new sequence id for ring 10ec Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] entering COMMIT state. Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] entering RECOVERY state. Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] position [0] member 192.168.15.152: Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] previous ring seq 4328 rep 192.168.15.151 Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] aru 71 high delivered 71 received flag 1 Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] Did not need to originate any messages in recovery. Dec 14 09:59:32 c5g-thor openais[1741]: [TOTEM] Sending initial ORF token Dec 14 09:59:32 c5g-thor clurgmgrd[2386]: <emerg> #1: Quorum Dissolved Dec 14 09:59:32 c5g-thor kernel: dlm: closing connection to node 2 Dec 14 09:59:32 c5g-thor openais[1741]: [CLM ] CLM CONFIGURATION CHANGE Dec 14 09:59:32 c5g-thor kernel: dlm: closing connection to node 3 Dec 14 09:59:32 c5g-thor openais[1741]: [CLM ] New Configuration: Dec 14 09:59:32 c5g-thor kernel: dlm: closing connection to node 4 Dec 14 09:59:32 c5g-thor openais[1741]: [CLM ] r(0) ip(192.168.15.152) Dec 14 09:59:32 c5g-thor openais[1741]: [CLM ] Members Left: Dec 14 09:59:32 c5g-thor openais[1741]: [CLM ] r(0) ip(192.168.15.151) Dec 14 09:59:32 c5g-thor openais[1741]: [CLM ] r(0) ip(192.168.15.153) Dec 14 09:59:32 c5g-thor openais[1741]: [CLM ] r(0) ip(192.168.15.154) Dec 14 09:59:32 c5g-thor openais[1741]: [CLM ] Members Joined: Dec 14 09:59:32 c5g-thor openais[1741]: [CMAN ] quorum lost, blocking activity Dec 14 09:59:32 c5g-thor openais[1741]: [CLM ] CLM CONFIGURATION CHANGE Dec 14 09:59:32 c5g-thor openais[1741]: [CLM ] New Configuration: Dec 14 09:59:32 c5g-thor openais[1741]: [CLM ] r(0) ip(192.168.15.152) Dec 14 09:59:33 c5g-thor openais[1741]: [CLM ] Members Left: Dec 14 09:59:33 c5g-thor openais[1741]: [CLM ] Members Joined: Dec 14 09:59:33 c5g-thor openais[1741]: [SYNC ] This node is within the primary component and will provide service. Dec 14 09:59:33 c5g-thor openais[1741]: [TOTEM] entering OPERATIONAL state. Dec 14 09:59:33 c5g-thor openais[1741]: [CLM ] got nodejoin message 192.168.15.152 Dec 14 09:59:33 c5g-thor openais[1741]: [CPG ] got joinlist message from node 1 Dec 14 09:59:33 c5g-thor openais[1741]: [TOTEM] entering GATHER state from 11. Dec 14 09:59:33 c5g-thor openais[1741]: [TOTEM] entering GATHER state from 11. Dec 14 09:59:33 c5g-thor ccsd[1704]: Cluster is not quorate. Refusing connection. The cluster.conf file looks like: <?xml version="1.0"?> <cluster alias="PESV" config_version="25" name="PESV"> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="c5g-thor.prisma" nodeid="1" votes="1"> <fence> <method name="1"> <device domain="c5g-thor" name="c5g-thor-f"/> </method> </fence> </clusternode> <clusternode name="c5g-backup.prisma" nodeid="2" votes="1"> <fence> <method name="1"> <device domain="c5g-backup" name="c5g-backup-f"/> </method> </fence> </clusternode> <clusternode name="c5g-memo.prisma" nodeid="3" votes="1"> <fence> <method name="1"> <device domain="c5g-memo" name="c5g-memo-f"/> </method> </fence> </clusternode> <clusternode name="c5g-steiner.prisma" nodeid="4" votes="1"> <fence> <method name="1"> <device domain="c5g-steiner" name="c5g-steiner-f"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice agent="fence_xvm" name="c5g-backup-f"/> <fencedevice agent="fence_xvm" name="c5g-thor-f"/> <fencedevice agent="fence_xvm" name="c5g-memo-f"/> <fencedevice agent="fence_xvm" name="c5g-steiner-f"/> </fencedevices> <rm> <failoverdomains/> <resources/> </rm> <totem token="30000"/> <cman/> </cluster> > On Wed, 2007-12-12 at 19:23 +0100, Paolo Marini wrote: >> I reiterate the request for help hoping someone has undergone (and >> hopefully solved) the same issues. >> >> I am building up a cluster of XEN Guests with root file system residing >> on a file on an GFS filesystem (iscsi actually). >> >> Each cluster node mounts an GFS file system residing on an iscsi device. >> >> For performance reasons, both the iscsi device and the physical nodes >> (part also of a cluster) use two gigabit ethernet with bonding and LACP. >> For the physical machines, I had to insert a sleep 30 on the >> /etc/init.d/iscsi script before the iscsi login, in order to wait for >> the bond interface to come up, otherwise the iscsi devices are not seen >> and no gfs mount is possible. >> >> Then, going to the cluster of XEN Guests, they work fine, I am able to >> migrate each one to a different physical node without problems on the >> guest. >> >> When I reboot or fence one of the guests, the guest cluster breaks, e.g. >> the quorum is dissolved and I have to fence ALL the nodes and reboot >> them in order for the cluster to restart. > > How many guests - and what are you using for fencing ? > >> Does it have to do with the xen bridge going up and down for a time >> longer than the heartbeat timeout ? > > Not sure - it shouldn't be that big of a deal. If you think that's the > problem try adding: > > <totem token="30000"/> > > to the vm cluster's cluster.conf > > -- Lon > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster