Hello I'm not sure, This is useful or not. Have you ever checked ``ping some_where'' on domU when cluster is broken?? ( I thought you are using Xen, because you are using 2.6.18-194.3.1.el5xen. ) If it does not respond anything, you should check iptables. (ex, disable iptables) -- Hiroyuki Sato 2011/5/31 Srija <swap_project@xxxxxxxxx>: > Thanks for your quick reply. > > I talked to the network people , but they are saying everything is good at their end. Is there anyway at the server end, to figure it for the switch restart or multicast traffic? > > I think you have already checked the cluster.conf file.. Except quorum disk, do you think that the cluster configuration is sufficient for handling the sixteen node cluster!! > > thanks again . > regards > > --- On Mon, 5/30/11, Kaloyan Kovachev <kkovachev@xxxxxxxxx> wrote: > >> From: Kaloyan Kovachev <kkovachev@xxxxxxxxx> >> Subject: Re: Cluster environment issue >> To: "linux clustering" <linux-cluster@xxxxxxxxxx> >> Date: Monday, May 30, 2011, 4:05 PM >> Hi, >> when your cluster gets broken, most likely the reason is, >> there is a >> network problem (switch restart or multicast traffic is >> lost for a while) >> on the interface where serverX-priv IPs are configured. >> Having a quorum >> disk may help by giving a quorum vote to one of the >> servers, so it can >> fence the others, but the best thing to do is to fix your >> network and >> preferably add a redundant link for the cluster >> communication to avoid >> breakage in the first place >> >> On Mon, 30 May 2011 12:17:07 -0700 (PDT), Srija <swap_project@xxxxxxxxx> >> wrote: >> > Hi, >> > >> > I am very new to the redhat cluster. Need some help >> and suggession for >> the >> > cluster configuration. >> > We have sixteen node cluster of >> > >> > OS >> : Linux Server release 5.5 (Tikanga) >> > >> kernel : 2.6.18-194.3.1.el5xen. >> > >> > The problem is sometimes the cluster is getting >> broken. The solution is >> > (still yet)to reboot the >> > sixteen nodes. Otherwise the nodes are not joining >> > >> > We are using clvm and not using any quorum disk. >> The quorum is by >> default. >> > >> > When it is getting broken, clustat commands >> shows evrything offline >> > except the node from where >> > the clustat command executed. If we execute vgs, >> lvs command, those >> > commands are getting hung. >> > >> > Here is at present the clustat report >> > ------------------------------------- >> > >> > [server1]# clustat >> > Cluster Status for newcluster @ Mon May 30 14:55:10 >> 2011 >> > Member Status: Quorate >> > >> > Member Name >> >> ID Status >> > ------ ---- >> ---- ------ >> > server1 >> 1 Online >> > server2 >> 2 Online, >> Local >> > server3 >> 3 Online >> > server4 >> 4 Online >> > server5 >> 5 Online >> > server6 >> 6 Online >> > server7 >> 7 Online >> > server8 >> 8 Online >> > server9 >> 9 Online >> > server10 >> >> 10 Online >> > server11 >> >> 11 Online >> > server12 >> >> 12 Online >> > server13 >> >> 13 Online >> > server14 >> >> 14 Online >> > server15 >> >> 15 Online >> > server16 >> >> 16 Online >> > >> > Here the cman_tool status output from one >> server >> > -------------------------------------------------- >> > >> > [server1 ~]# cman_tool status >> > Version: 6.2.0 >> > Config Version: 23 >> > Cluster Name: newcluster >> > Cluster Id: 53322 >> > Cluster Member: Yes >> > Cluster Generation: 11432 >> > Membership state: Cluster-Member >> > Nodes: 16 >> > Expected votes: 16 >> > Total votes: 16 >> > Quorum: 9 >> > Active subsystems: 8 >> > Flags: Dirty >> > Ports Bound: 0 11 >> > Node name: server1 >> > Node ID: 1 >> > Multicast addresses: xxx.xxx.xxx.xx >> > Node addresses: 192.168.xxx.xx >> > >> > >> > Here is the cluster.conf file. >> > ------------------------------ >> > >> > <?xml version="1.0"?> >> > <cluster alias="newcluster" config_version="23" >> name="newcluster"> >> > <fence_daemon clean_start="1" post_fail_delay="0" >> post_join_delay="15"/> >> > >> > <clusternodes> >> > >> > <clusternode name="server1-priv" nodeid="1" >> votes="1"> >> > >> <fence><method name="1"> >> > >> <device name="ilo-server1r"/></method> >> > >> </fence> >> > </clusternode> >> > >> > <clusternode name="server2-priv" nodeid="3" >> votes="1"> >> > >> <fence><method name="1"> >> > <device >> name="ilo-server2r"/></method> >> > </fence> >> > </clusternode> >> > >> > <clusternode name="server3-priv" nodeid="2" >> votes="1"> >> > >> <fence><method name="1"> >> > <device >> name="ilo-server3r"/></method> >> > </fence> >> > </clusternode> >> > >> > [ ... sinp .....] >> > >> > <clusternode name="server16-priv" nodeid="16" >> votes="1"> >> > <fence><method >> name="1"> >> > <device >> name="ilo-server16r"/></method> >> > </fence> >> > </clusternode> >> > >> > </clusternodes> >> > <cman/> >> > >> > <dlm plock_ownership="1" plock_rate_limit="0"/> >> > <gfs_controld plock_rate_limit="0"/> >> > >> > <fencedevices> >> > <fencedevice >> agent="fence_ilo" hostname="server1r" login="Admin" >> > >> name="ilo-server1r" passwd="xxxxx"/> >> > .......... >> > <fencedevice >> agent="fence_ilo" hostname="server16r" >> login="Admin" >> > >> name="ilo-server16r" passwd="xxxxx"/> >> > </fencedevices> >> > <rm> >> > <failoverdomains/> >> > <resources/> >> > </rm></cluster> >> > >> > Here is the lvm.conf file >> > -------------------------- >> > >> > devices { >> > >> > dir = "/dev" >> > scan = [ "/dev" ] >> > preferred_names = [ ] >> > filter = [ >> "r/scsi.*/","r/pci.*/","r/sd.*/","a/.*/" ] >> > cache_dir = "/etc/lvm/cache" >> > cache_file_prefix = "" >> > write_cache_state = 1 >> > sysfs_scan = 1 >> > md_component_detection = 1 >> > md_chunk_alignment = 1 >> > data_alignment_detection = 1 >> > data_alignment = 0 >> > >> data_alignment_offset_detection = 1 >> > ignore_suspended_devices = 0 >> > } >> > >> > log { >> > >> > verbose = 0 >> > syslog = 1 >> > overwrite = 0 >> > level = 0 >> > indent = 1 >> > command_names = 0 >> > prefix = " " >> > } >> > >> > backup { >> > >> > backup = 1 >> > backup_dir = >> "/etc/lvm/backup" >> > archive = 1 >> > archive_dir = >> "/etc/lvm/archive" >> > retain_min = 10 >> > retain_days = 30 >> > } >> > >> > shell { >> > >> > history_size = 100 >> > } >> > global { >> > library_dir = "/usr/lib64" >> > umask = 077 >> > test = 0 >> > units = "h" >> > si_unit_consistency = 0 >> > activation = 1 >> > proc = "/proc" >> > locking_type = 3 >> > wait_for_locks = 1 >> > fallback_to_clustered_locking >> = 1 >> > fallback_to_local_locking = 1 >> > locking_dir = "/var/lock/lvm" >> > prioritise_write_locks = 1 >> > } >> > >> > activation { >> > udev_sync = 1 >> > missing_stripe_filler = >> "error" >> > reserved_stack = 256 >> > reserved_memory = 8192 >> > process_priority = -18 >> > mirror_region_size = 512 >> > readahead = "auto" >> > mirror_log_fault_policy = >> "allocate" >> > mirror_image_fault_policy = >> "remove" >> > } >> > dmeventd { >> > >> > mirror_library = >> "libdevmapper-event-lvm2mirror.so" >> > snapshot_library = >> "libdevmapper-event-lvm2snapshot.so" >> > } >> > >> > >> > If you need more information, I can >> provide ... >> > >> > Thanks for your help >> > Priya >> > >> > -- >> > Linux-cluster mailing list >> > Linux-cluster@xxxxxxxxxx >> > https://www.redhat.com/mailman/listinfo/linux-cluster >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster