Ahh.. forgot cluster.xml and i'm using 2.6.18-8.1.14.el5xen kernel. <?xml version="1.0"?> <cluconfig version="3.0"> <clumembd broadcast="yes" interval="750000" loglevel="5" multicast="no" multicast_ipaddress="" thread="yes" tko_count="20"/> <cluquorumd loglevel="5" pinginterval="" tiebreaker_ip=""/> <clurmtabd loglevel="5" pollinterval="4"/> <clusvcmgrd loglevel="5"/> <clulockd loglevel="5"/> <cluster config_viewnumber="3" key="975b29840bb8835ce57b0fff3354fabc" name="Cluster"/> <sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1" rawshadow="/dev/raw/raw2" type="raw"/> <members> <member id="0" name="cl1" watchdog="yes"> </member> <member id="1" name="cl2" watchdog="yes"/> </members> <services> <service checkinterval="20" failoverdomain="None" id="0" maxfalsestarts="0" maxrestarts="0" name="mysql" userscript="/etc/init.d/mysql1"> <service_ipaddresses> <service_ipaddress broadcast="172.30.5.255" id="0" ipaddress="172.30.5.113" monitor_link="0" netmask="255.255.255.0"/> </service_ipaddresses> <device id="0" name="/dev/hda5" sharename=""> <mount forceunmount="yes" fstype="ext3" mountpoint="/var/lib/mysql1" options="sync,rw,nosuid"/> </device> </service> <service checkinterval="0" failoverdomain="None" id="1" maxfalsestarts="0" maxrestarts="0" name="nfs" userscript="None"> <service_ipaddresses> <service_ipaddress broadcast="172.30.5.255" id="0" ipaddress="172.30.5.114" monitor_link="0" netmask="255.255.255.0"/> </service_ipaddresses> </service> </services> <failoverdomains/> </cluconfig> Thanks Nuno Fernandes On Friday 25 January 2008 12:00:06 Nuno Fernandes wrote: > Hi, > > I'm in the process of migrating a cluster of two nodes to two virtual > machines. > > The real servers have clumanager-1.0.28-1 (RHEL3/CentOS3). > I've migrated all the filesystems and started of the process of > reconfiguring the cluster. > > The real servers clustat: > > Cluster Status Monitor (Cluster) 11:50:14 > > Cluster alias: Not Configured > > ========================= M e m b e r S t a t u s > ========================== > > Member Status Node Id Power Switch > -------------- ---------- ---------- ------------ > cl1 Up 0 Good > cl2 Up 1 Good > > ========================= H e a r t b e a t S t a t u s > ==================== > > Name Type Status > ------------------------------ ---------- ------------ > cl1 <--> cl2 network ONLINE > cln1 <--> cln2 network ONLINE > > ========================= S e r v i c e S t a t u s > ======================== > > Last Monitor Restart > Service Status Owner Transition Interval Count > -------------- -------- -------------- ---------------- -------- ------- > mysql1 started cl2 00:16:28 Oct 23 10 1 > nfs started cl2 23:20:58 Oct 08 10 0 > > > Everything is about the same in the virtual cluster, except that they don't > have any powerwitch, there is only one network. They both use network and > quorum to check if the other node is ok. > > The problem is in the virtual cluster. I've upgraded to clumanager-1.2.34-3 > in the virtual cluster to check if it was an bug in the previous one. Both > nodes can't see each other through the network. They think the other is > Inactive. As i start cl1 clumanager i get: > > Jan 25 11:52:22 cl1 clumanager: [15039]: <notice> Starting Red Hat Cluster > Manager... > Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers > configured for host 'cl1'! > Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity > may be compromised! > Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers > configured for host 'cl2'! > Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity > may be compromised! > Jan 25 11:52:22 cl1 clumanager: cluquorumd startup succeeded > Jan 25 11:52:33 cl1 clumembd[15056]: <notice> Member cl1 UP > Jan 25 11:52:34 cl1 cluquorumd[15054]: <notice> Quorum Formed; Starting > Service Manager > Jan 25 11:52:34 cl1 clusvcmgrd: [15067]: <notice> service notice: Stopping > service mysql ... > Jan 25 11:52:35 cl1 clusvcmgrd: [15067]: <notice> service notice: Running > user script '/etc/init.d/mysql1 stop' > Jan 25 11:52:37 cl1 clusvcmgrd: [15067]: <notice> service notice: Stopped > service mysql ... > Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: Stopping > service nfs ... > Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: Stopped > service nfs ... > Jan 25 11:52:37 cl1 clusvcmgrd[15381]: <notice> Starting stopped service > mysql Jan 25 11:52:37 cl1 clusvcmgrd[15395]: <notice> Starting stopped > service nfs Jan 25 11:52:37 cl1 clusvcmgrd: [15382]: <notice> service > notice: Starting service mysql ... > Jan 25 11:52:37 cl1 clusvcmgrd: [15420]: <notice> service notice: Starting > service nfs ... > Jan 25 11:52:37 cl1 kernel: kjournald starting. Commit interval 5 seconds > Jan 25 11:52:37 cl1 kernel: EXT3 FS on hda5, internal journal > Jan 25 11:52:37 cl1 kernel: EXT3-fs: mounted filesystem with ordered data > mode. > Jan 25 11:52:37 cl1 /sbin/hotplug: no runnable /etc/hotplug/block.agent is > installed > Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Running > user script '/etc/init.d/mysql1 start' > Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Started > service mysql ... > Jan 25 11:52:38 cl1 clusvcmgrd: [15420]: <notice> service notice: Started > service nfs ... > > Everything seems ok... Then i start cl2's clumanager: > > cl2 -bash: (1836) [root.root] |.| /etc/init.d/clumanager start > Jan 25 11:54:56 cl2 clumanager: [7651]: <notice> Starting Red Hat Cluster > Manager... > Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers > configured for host 'cl1'! > Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity may > be compromised! > Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers > configured for host 'cl2'! > Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity may > be compromised! > Jan 25 11:54:56 cl2 clumanager: cluquorumd startup succeeded > Jan 25 11:55:07 cl2 clumembd[7670]: <notice> Member cl2 UP > Jan 25 11:55:08 cl2 cluquorumd[7666]: <warning> Membership reports #0 as > down, but disk reports as up: State uncertain! > Jan 25 11:55:08 cl2 cluquorumd[7666]: <notice> Quorum Formed; Starting > Service Manager > Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopping > service mysql ... > Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Running > user script '/etc/init.d/mysql1 stop' > Jan 25 11:55:10 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopped > service mysql ... > Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopping > service nfs ... > Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopped > service nfs ... > > Now we have a problem... > "cluquorumd[7666]: <warning> Membership reports #0 as down, but disk > reports as up: State uncertain!" > > Clustat from cl1 reports: > > Cluster Status - Cluster 11:54:16 > Cluster Quorum Incarnation #1 > Shared State: Shared Raw Device Driver v1.2 > > Member Status > ------------------ ---------- > cl1 Active <-- You are here > cl2 Inactive > > Service Status Owner (Last) Last Transition Chk Restarts > -------------- -------- ---------------- --------------- --- -------- > mysql started cl1 11:52:37 Jan 25 20 0 > nfs started cl1 11:52:37 Jan 25 0 0 > > Clustat from cl2 reports: > Cluster Status - Cluster 11:56:30 > Cluster Quorum Incarnation #1 > Shared State: Shared Raw Device Driver v1.2 > > Member Status > ------------------ ---------- > cl1 Inactive > cl2 Active <-- You are here > > Service Status Owner (Last) Last Transition Chk Restarts > -------------- -------- ---------------- --------------- --- -------- > mysql started cl1 11:52:37 Jan 25 20 0 > nfs started cl1 11:52:37 Jan 25 0 0 > > I have network connectivity working: > > [root@cl1 root]# ping -c2 -s30000 cl2 > PING cl2 (172.30.5.112) 30000(30028) bytes of data. > 30008 bytes from cl2 (172.30.5.112): icmp_seq=0 ttl=64 time=1.08 ms > 30008 bytes from cl2 (172.30.5.112): icmp_seq=1 ttl=64 time=1.09 ms > > [root@cl2 root]# ping -c2 -s30000 cl1 > PING cl1 (172.30.5.111) 30000(30028) bytes of data. > 30008 bytes from cl1 (172.30.5.111): icmp_seq=0 ttl=64 time=1.09 ms > 30008 bytes from cl1 (172.30.5.111): icmp_seq=1 ttl=64 time=0.998 ms > > Quorum seems ok, but network doesn't. > > [root@cl1 root]# shutil -p /cluster/header > /cluster/header is 144 bytes long > SharedStateHeader { > ss_magic = 0x39119fcd > ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008) > ss_updateHost = cl1.datacenter.imoportal.pt > } > > [root@cl2 root]# shutil -p /cluster/header > /cluster/header is 144 bytes long > SharedStateHeader { > ss_magic = 0x39119fcd > ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008) > ss_updateHost = cl1.datacenter.imoportal.pt > } > > > Any ideas? Thanks > Nuno Fernandes > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster