On Tue, 2006-11-14 at 20:06 -0500, jason@xxxxxxxxxxxxxx wrote: > > and when I reboot both servers of 2 node cluster, they come up fine.. > [jason@tf2 ~]$ clustat > Member Status: Quorate, Group Member > > Member Name State ID > ------ ---- ----- -- > tf1 Online 0x0000000000000001 > tf2 Online 0x0000000000000002 > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > Apache Service tf1 started > [jason@tf2 ~]$ > > when I reboot (shutdown -r now) tf1, > tf2 never takes over > > [jason@tf2 ~]$ clustat > Member Status: Quorate, Group Member > > Member Name State ID > ------ ---- ----- -- > tf2 Online 0x0000000000000002 > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > Apache Service ((null) ) failed > [jason@tf2 ~]$ > > heres the logs from tf2: > > Nov 14 19:48:21 tf2 clurgmgrd[5345]: <info> Logged in SG "usrm::manager" > Nov 14 19:48:21 tf2 clurgmgrd[5345]: <info> Magma Event: Membership Change > Nov 14 19:48:21 tf2 clurgmgrd[5345]: <info> State change: Local UP > Nov 14 19:48:22 tf2 clurgmgrd[5345]: <info> State change: tf1 UP > Nov 14 19:48:25 tf2 snmpd[5195]: Got trap from peer on fd 13 > Nov 14 19:48:44 tf2 kernel: process `omaws32' is using obsolete setsockopt SO_BSDCOMPAT > Nov 14 19:48:58 tf2 Server Administrator: Storage Service EventID: 2164 See readme.txt for a list > of validated controller driver versions. > Nov 14 19:49:00 tf2 snmpd[5195]: Got trap from peer on fd 13 > Nov 14 19:50:31 tf2 sshd(pam_unix)[6920]: session opened for user jason by (uid=0) > Nov 14 19:51:03 tf2 sshd(pam_unix)[6951]: session opened for user jason by (uid=0) > > Nov 14 19:51:39 tf2 clurgmgrd[5345]: <info> Magma Event: Membership Change > Nov 14 19:51:39 tf2 clurgmgrd[5345]: <info> State change: tf1 DOWN > Nov 14 19:52:19 tf2 ntpd[4896]: synchronized to 193.162.159.97, stratum 2 > Nov 14 19:52:19 tf2 ntpd[4896]: kernel time sync disabled 0041 > Nov 14 19:52:28 tf2 kernel: e100: eth2: e100_watchdog: link down > Nov 14 19:52:34 tf2 kernel: CMAN: removing node tf1 from the cluster : Missed too many heartbeats > Nov 14 19:52:58 tf2 kernel: e100: eth2: e100_watchdog: link up, 100Mbps, full-duplex > Nov 14 19:55:14 tf2 kernel: CMAN: node tf1 rejoining > Nov 14 19:55:45 tf2 clurgmgrd[5345]: <info> Magma Event: Membership Change > Nov 14 19:55:45 tf2 clurgmgrd[5345]: <info> State change: tf1 UP > > > then when tf1 comes back up, my apache service doesnt come up correctly.. > > [jason@tf2 ~]$ clustat > Member Status: Quorate, Group Member > > Member Name State ID > ------ ---- ----- -- > tf1 Online 0x0000000000000001 > tf2 Online 0x0000000000000002 > > Service Name Owner (Last) State > ------- ---- ----- ------ ----- > Apache Service (tf1 ) failed > [jason@tf2 ~]$ > > > and I see this in the logs on tf1 as hes booting up. > Nov 14 19:55:44 tf1 rhnsd[5445]: Red Hat Network Services Daemon starting up. > Nov 14 19:55:44 tf1 rhnsd: rhnsd startup succeeded > Nov 14 19:55:44 tf1 cups-config-daemon: cups-config-daemon startup succeeded > Nov 14 19:55:44 tf1 haldaemon: haldaemon startup succeeded > Nov 14 19:55:44 tf1 clurgmgrd[5488]: <info> Loading Service Data > Nov 14 19:55:44 tf1 rgmanager: clurgmgrd startup succeeded > Nov 14 19:55:44 tf1 fstab-sync[5764]: removed all generated mount points > Nov 14 19:55:45 tf1 clurgmgrd[5488]: <info> Initializing Services > Nov 14 19:55:45 tf1 fstab-sync[6152]: added mount point /media/cdrom for /dev/hda > Nov 14 19:55:45 tf1 httpd: httpd shutdown failed > Nov 14 19:55:45 tf1 clurgmgrd[5488]: <notice> stop on script "cluster_apache" returned 1 (generic > error) > Nov 14 19:55:45 tf1 clurgmgrd[5488]: <info> Services Initialized > Nov 14 19:55:45 tf1 clurgmgrd[5488]: <info> Logged in SG "usrm::manager" > Nov 14 19:55:45 tf1 clurgmgrd[5488]: <info> Magma Event: Membership Change > Nov 14 19:55:45 tf1 clurgmgrd[5488]: <info> State change: Local UP > Nov 14 19:55:46 tf1 fstab-sync[6465]: added mount point /media/floppy for /dev/fd0 > Nov 14 19:55:46 tf1 clurgmgrd[5488]: <info> State change: tf2 UP > > any suggestions? > http://sources.redhat.com/cluster/faq.html#rgm_wontrestart The init script probably is returning 1 for stop-after-stop (or stop-when-stopped), when it should be returning 0. This is a bug in the initscripts package, and here's a patch to /etc/init.d/functions to make httpd work normally: https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=111998 -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster