Just for the record... Solved adding tiebreaker ip. Thanks Nuno Fernandes On Friday 25 January 2008 12:18:07 Nuno Fernandes wrote: > Ahh.. forgot cluster.xml and i'm using 2.6.18-8.1.14.el5xen kernel. > > <?xml version="1.0"?> > <cluconfig version="3.0"> > <clumembd broadcast="yes" interval="750000" loglevel="5" multicast="no" > multicast_ipaddress="" thread="yes" tko_count="20"/> > <cluquorumd loglevel="5" pinginterval="" tiebreaker_ip=""/> > <clurmtabd loglevel="5" pollinterval="4"/> > <clusvcmgrd loglevel="5"/> > <clulockd loglevel="5"/> > <cluster config_viewnumber="3" key="975b29840bb8835ce57b0fff3354fabc" > name="Cluster"/> > <sharedstate driver="libsharedraw.so" rawprimary="/dev/raw/raw1" > rawshadow="/dev/raw/raw2" type="raw"/> > <members> > <member id="0" name="cl1" watchdog="yes"> > </member> > <member id="1" name="cl2" watchdog="yes"/> > </members> > <services> > <service checkinterval="20" failoverdomain="None" id="0" > maxfalsestarts="0" maxrestarts="0" name="mysql" > userscript="/etc/init.d/mysql1"> > <service_ipaddresses> > <service_ipaddress broadcast="172.30.5.255" id="0" > ipaddress="172.30.5.113" monitor_link="0" netmask="255.255.255.0"/> > </service_ipaddresses> > <device id="0" name="/dev/hda5" sharename=""> > <mount forceunmount="yes" fstype="ext3" > mountpoint="/var/lib/mysql1" options="sync,rw,nosuid"/> > </device> > </service> > <service checkinterval="0" failoverdomain="None" id="1" > maxfalsestarts="0" maxrestarts="0" name="nfs" userscript="None"> > <service_ipaddresses> > <service_ipaddress broadcast="172.30.5.255" id="0" > ipaddress="172.30.5.114" monitor_link="0" netmask="255.255.255.0"/> > </service_ipaddresses> > </service> > </services> > <failoverdomains/> > </cluconfig> > > Thanks > Nuno Fernandes > > On Friday 25 January 2008 12:00:06 Nuno Fernandes wrote: > > Hi, > > > > I'm in the process of migrating a cluster of two nodes to two virtual > > machines. > > > > The real servers have clumanager-1.0.28-1 (RHEL3/CentOS3). > > I've migrated all the filesystems and started of the process of > > reconfiguring the cluster. > > > > The real servers clustat: > > > > Cluster Status Monitor (Cluster) 11:50:14 > > > > Cluster alias: Not Configured > > > > ========================= M e m b e r S t a t u s > > ========================== > > > > Member Status Node Id Power Switch > > -------------- ---------- ---------- ------------ > > cl1 Up 0 Good > > cl2 Up 1 Good > > > > ========================= H e a r t b e a t S t a t u s > > ==================== > > > > Name Type Status > > ------------------------------ ---------- ------------ > > cl1 <--> cl2 network ONLINE > > cln1 <--> cln2 network ONLINE > > > > ========================= S e r v i c e S t a t u s > > ======================== > > > > Last Monitor > > Restart Service Status Owner Transition Interval > > Count -------------- -------- -------------- ---------------- -------- > > ------- mysql1 started cl2 00:16:28 Oct 23 10 > > 1 nfs started cl2 23:20:58 Oct 08 10 0 > > > > > > Everything is about the same in the virtual cluster, except that they > > don't have any powerwitch, there is only one network. They both use > > network and quorum to check if the other node is ok. > > > > The problem is in the virtual cluster. I've upgraded to > > clumanager-1.2.34-3 in the virtual cluster to check if it was an bug in > > the previous one. Both nodes can't see each other through the network. > > They think the other is Inactive. As i start cl1 clumanager i get: > > > > Jan 25 11:52:22 cl1 clumanager: [15039]: <notice> Starting Red Hat > > Cluster Manager... > > Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers > > configured for host 'cl1'! > > Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity > > may be compromised! > > Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: No drivers > > configured for host 'cl2'! > > Jan 25 11:52:22 cl1 cluquorumd[15053]: <warning> STONITH: Data integrity > > may be compromised! > > Jan 25 11:52:22 cl1 clumanager: cluquorumd startup succeeded > > Jan 25 11:52:33 cl1 clumembd[15056]: <notice> Member cl1 UP > > Jan 25 11:52:34 cl1 cluquorumd[15054]: <notice> Quorum Formed; Starting > > Service Manager > > Jan 25 11:52:34 cl1 clusvcmgrd: [15067]: <notice> service notice: > > Stopping service mysql ... > > Jan 25 11:52:35 cl1 clusvcmgrd: [15067]: <notice> service notice: Running > > user script '/etc/init.d/mysql1 stop' > > Jan 25 11:52:37 cl1 clusvcmgrd: [15067]: <notice> service notice: Stopped > > service mysql ... > > Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: > > Stopping service nfs ... > > Jan 25 11:52:37 cl1 clusvcmgrd: [15244]: <notice> service notice: Stopped > > service nfs ... > > Jan 25 11:52:37 cl1 clusvcmgrd[15381]: <notice> Starting stopped service > > mysql Jan 25 11:52:37 cl1 clusvcmgrd[15395]: <notice> Starting stopped > > service nfs Jan 25 11:52:37 cl1 clusvcmgrd: [15382]: <notice> service > > notice: Starting service mysql ... > > Jan 25 11:52:37 cl1 clusvcmgrd: [15420]: <notice> service notice: > > Starting service nfs ... > > Jan 25 11:52:37 cl1 kernel: kjournald starting. Commit interval 5 > > seconds Jan 25 11:52:37 cl1 kernel: EXT3 FS on hda5, internal journal > > Jan 25 11:52:37 cl1 kernel: EXT3-fs: mounted filesystem with ordered data > > mode. > > Jan 25 11:52:37 cl1 /sbin/hotplug: no runnable /etc/hotplug/block.agent > > is installed > > Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Running > > user script '/etc/init.d/mysql1 start' > > Jan 25 11:52:38 cl1 clusvcmgrd: [15382]: <notice> service notice: Started > > service mysql ... > > Jan 25 11:52:38 cl1 clusvcmgrd: [15420]: <notice> service notice: Started > > service nfs ... > > > > Everything seems ok... Then i start cl2's clumanager: > > > > cl2 -bash: (1836) [root.root] |.| /etc/init.d/clumanager start > > Jan 25 11:54:56 cl2 clumanager: [7651]: <notice> Starting Red Hat Cluster > > Manager... > > Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers > > configured for host 'cl1'! > > Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity > > may be compromised! > > Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: No drivers > > configured for host 'cl2'! > > Jan 25 11:54:56 cl2 cluquorumd[7665]: <warning> STONITH: Data integrity > > may be compromised! > > Jan 25 11:54:56 cl2 clumanager: cluquorumd startup succeeded > > Jan 25 11:55:07 cl2 clumembd[7670]: <notice> Member cl2 UP > > Jan 25 11:55:08 cl2 cluquorumd[7666]: <warning> Membership reports #0 as > > down, but disk reports as up: State uncertain! > > Jan 25 11:55:08 cl2 cluquorumd[7666]: <notice> Quorum Formed; Starting > > Service Manager > > Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopping > > service mysql ... > > Jan 25 11:55:08 cl2 clusvcmgrd: [7679]: <notice> service notice: Running > > user script '/etc/init.d/mysql1 stop' > > Jan 25 11:55:10 cl2 clusvcmgrd: [7679]: <notice> service notice: Stopped > > service mysql ... > > Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopping > > service nfs ... > > Jan 25 11:55:10 cl2 clusvcmgrd: [7856]: <notice> service notice: Stopped > > service nfs ... > > > > Now we have a problem... > > "cluquorumd[7666]: <warning> Membership reports #0 as down, but disk > > reports as up: State uncertain!" > > > > Clustat from cl1 reports: > > > > Cluster Status - Cluster 11:54:16 > > Cluster Quorum Incarnation #1 > > Shared State: Shared Raw Device Driver v1.2 > > > > Member Status > > ------------------ ---------- > > cl1 Active <-- You are here > > cl2 Inactive > > > > Service Status Owner (Last) Last Transition Chk Restarts > > -------------- -------- ---------------- --------------- --- -------- > > mysql started cl1 11:52:37 Jan 25 20 0 > > nfs started cl1 11:52:37 Jan 25 0 0 > > > > Clustat from cl2 reports: > > Cluster Status - Cluster 11:56:30 > > Cluster Quorum Incarnation #1 > > Shared State: Shared Raw Device Driver v1.2 > > > > Member Status > > ------------------ ---------- > > cl1 Inactive > > cl2 Active <-- You are here > > > > Service Status Owner (Last) Last Transition Chk Restarts > > -------------- -------- ---------------- --------------- --- -------- > > mysql started cl1 11:52:37 Jan 25 20 0 > > nfs started cl1 11:52:37 Jan 25 0 0 > > > > I have network connectivity working: > > > > [root@cl1 root]# ping -c2 -s30000 cl2 > > PING cl2 (172.30.5.112) 30000(30028) bytes of data. > > 30008 bytes from cl2 (172.30.5.112): icmp_seq=0 ttl=64 time=1.08 ms > > 30008 bytes from cl2 (172.30.5.112): icmp_seq=1 ttl=64 time=1.09 ms > > > > [root@cl2 root]# ping -c2 -s30000 cl1 > > PING cl1 (172.30.5.111) 30000(30028) bytes of data. > > 30008 bytes from cl1 (172.30.5.111): icmp_seq=0 ttl=64 time=1.09 ms > > 30008 bytes from cl1 (172.30.5.111): icmp_seq=1 ttl=64 time=0.998 ms > > > > Quorum seems ok, but network doesn't. > > > > [root@cl1 root]# shutil -p /cluster/header > > /cluster/header is 144 bytes long > > SharedStateHeader { > > ss_magic = 0x39119fcd > > ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008) > > ss_updateHost = cl1.datacenter.imoportal.pt > > } > > > > [root@cl2 root]# shutil -p /cluster/header > > /cluster/header is 144 bytes long > > SharedStateHeader { > > ss_magic = 0x39119fcd > > ss_timestamp = 0x000000004798e63b (19:25:47 Jan 24 2008) > > ss_updateHost = cl1.datacenter.imoportal.pt > > } > > > > > > Any ideas? Thanks > > Nuno Fernandes > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster