I think the mailing list doesnt like attachments, so heres a link to the panic that was supposed to go along with this post. http://monsterjam.org/crash/panic.jpg I tried stopping the services on the first box of my 2 node cluster: service rgmanager stop service gfs stop service clvmd stop service fenced stop service cman stop service ccsd stop everything came down fine. then I started em back up.. service ccsd start this seemed to hang for about 2 minutes, then I got a panic.. as shown in the linked above graphic.. this is on 2.6.9-34.ELsmp redhat Enterprise Linux AS release 4 (Nahant Update 4) running ccs-1.0.3-0, cman-kernel-hugemem-2.6.9-43.8 cman-kernel-2.6.9-43.8 cman-1.0.4-0 cman-kernel-smp-2.6.9-43.8 cman-kernheaders-2.6.9-43.8 built from sources.. heres my cluster.conf <?xml version="1.0"?> <cluster config_version="22" name="progressive"> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/> <clusternodes> <clusternode name="tf1" votes="1"> <fence> <method name="1"> <device name="apc_power_switch" option=" off" port="1" switch="1"/> <device name="apc_power_switch" option=" off" port="2" switch="1"/> <device name="apc_power_switch" option=" on" port="1" switch="1"/> <device name="apc_power_switch" option=" on" port="2" switch="1"/> </method> </fence> </clusternode> <clusternode name="tf2" votes="1"> <fence> <method name="1"> <device name="apc_power_switch" option=" off" port="3" switch="1"/> <device name="apc_power_switch" option=" off" port="4" switch="1"/> <device name="apc_power_switch" option=" on" port="3" switch="1"/> <device name="apc_power_switch" option=" on" port="4" switch="1"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_apc" ipaddr="192.168.1.8" login="xxx" name="apc_power_switch" passwd="xxx"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="httpd" ordered="1" restricted="1"> <failoverdomainnode name="tf1" priority="1"/> <failoverdomainnode name="tf2" priority="2"/> </failoverdomain> </failoverdomains> <resources> <script file="/etc/init.d/httpd" name="cluster_apache"/> <fs device="/dev/mapper/diskarray-lv1" fstype="ext3" mou ntpoint="/mnt/gfs/htdocs" name="apache_content"/> <ip address="192.168.1.7" monitor_link="1"/> </resources> <service autostart="1" domain="httpd" name="Apache Service"> <ip ref="192.168.1.7"/> <script ref="cluster_apache"/> <fs ref="apache_content"/> </service> </rm> </cluster> ooh and shortly after the first box came back up, the second one got rebooted automagically (power fenced from the first one im guessing) for good measure. any help appreciated Jason On Tue, Oct 17, 2006 at 09:37:15PM -0400, jason@xxxxxxxxxxxxxx wrote: > so Ive had a test cluster running for quite a while now, both nodes of a 2 node cluster are up, > but the virtual address seems to have disappeared.. its not pingable, neither server has it > configured anymore.. The only application I had using the virtual address was apache (just for > testing it). what logs/information should I be looking at to see what happened and why? > > regards, > Jason > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster