Thanks for your quick reply. I talked to the network people , but they are saying everything is good at their end. Is there anyway at the server end, to figure it for the switch restart or multicast traffic? I think you have already checked the cluster.conf file.. Except quorum disk, do you think that the cluster configuration is sufficient for handling the sixteen node cluster!! thanks again . regards --- On Mon, 5/30/11, Kaloyan Kovachev <kkovachev@xxxxxxxxx> wrote: > From: Kaloyan Kovachev <kkovachev@xxxxxxxxx> > Subject: Re: Cluster environment issue > To: "linux clustering" <linux-cluster@xxxxxxxxxx> > Date: Monday, May 30, 2011, 4:05 PM > Hi, > when your cluster gets broken, most likely the reason is, > there is a > network problem (switch restart or multicast traffic is > lost for a while) > on the interface where serverX-priv IPs are configured. > Having a quorum > disk may help by giving a quorum vote to one of the > servers, so it can > fence the others, but the best thing to do is to fix your > network and > preferably add a redundant link for the cluster > communication to avoid > breakage in the first place > > On Mon, 30 May 2011 12:17:07 -0700 (PDT), Srija <swap_project@xxxxxxxxx> > wrote: > > Hi, > > > > I am very new to the redhat cluster. Need some help > and suggession for > the > > cluster configuration. > > We have sixteen node cluster of > > > > OS > : Linux Server release 5.5 (Tikanga) > > > kernel : 2.6.18-194.3.1.el5xen. > > > > The problem is sometimes the cluster is getting > broken. The solution is > > (still yet)to reboot the > > sixteen nodes. Otherwise the nodes are not joining > > > > We are using clvm and not using any quorum disk. > The quorum is by > default. > > > > When it is getting broken, clustat commands > shows evrything offline > > except the node from where > > the clustat command executed. If we execute vgs, > lvs command, those > > commands are getting hung. > > > > Here is at present the clustat report > > ------------------------------------- > > > > [server1]# clustat > > Cluster Status for newcluster @ Mon May 30 14:55:10 > 2011 > > Member Status: Quorate > > > > Member Name > > ID Status > > ------ ---- > ---- ------ > > server1 > 1 Online > > server2 > 2 Online, > Local > > server3 > 3 Online > > server4 > 4 Online > > server5 > 5 Online > > server6 > 6 Online > > server7 > 7 Online > > server8 > 8 Online > > server9 > 9 Online > > server10 > > 10 Online > > server11 > > 11 Online > > server12 > > 12 Online > > server13 > > 13 Online > > server14 > > 14 Online > > server15 > > 15 Online > > server16 > > 16 Online > > > > Here the cman_tool status output from one > server > > -------------------------------------------------- > > > > [server1 ~]# cman_tool status > > Version: 6.2.0 > > Config Version: 23 > > Cluster Name: newcluster > > Cluster Id: 53322 > > Cluster Member: Yes > > Cluster Generation: 11432 > > Membership state: Cluster-Member > > Nodes: 16 > > Expected votes: 16 > > Total votes: 16 > > Quorum: 9 > > Active subsystems: 8 > > Flags: Dirty > > Ports Bound: 0 11 > > Node name: server1 > > Node ID: 1 > > Multicast addresses: xxx.xxx.xxx.xx > > Node addresses: 192.168.xxx.xx > > > > > > Here is the cluster.conf file. > > ------------------------------ > > > > <?xml version="1.0"?> > > <cluster alias="newcluster" config_version="23" > name="newcluster"> > > <fence_daemon clean_start="1" post_fail_delay="0" > post_join_delay="15"/> > > > > <clusternodes> > > > > <clusternode name="server1-priv" nodeid="1" > votes="1"> > > > <fence><method name="1"> > > > <device name="ilo-server1r"/></method> > > > </fence> > > </clusternode> > > > > <clusternode name="server2-priv" nodeid="3" > votes="1"> > > > <fence><method name="1"> > > <device > name="ilo-server2r"/></method> > > </fence> > > </clusternode> > > > > <clusternode name="server3-priv" nodeid="2" > votes="1"> > > > <fence><method name="1"> > > <device > name="ilo-server3r"/></method> > > </fence> > > </clusternode> > > > > [ ... sinp .....] > > > > <clusternode name="server16-priv" nodeid="16" > votes="1"> > > <fence><method > name="1"> > > <device > name="ilo-server16r"/></method> > > </fence> > > </clusternode> > > > > </clusternodes> > > <cman/> > > > > <dlm plock_ownership="1" plock_rate_limit="0"/> > > <gfs_controld plock_rate_limit="0"/> > > > > <fencedevices> > > <fencedevice > agent="fence_ilo" hostname="server1r" login="Admin" > > > name="ilo-server1r" passwd="xxxxx"/> > > .......... > > <fencedevice > agent="fence_ilo" hostname="server16r" > login="Admin" > > > name="ilo-server16r" passwd="xxxxx"/> > > </fencedevices> > > <rm> > > <failoverdomains/> > > <resources/> > > </rm></cluster> > > > > Here is the lvm.conf file > > -------------------------- > > > > devices { > > > > dir = "/dev" > > scan = [ "/dev" ] > > preferred_names = [ ] > > filter = [ > "r/scsi.*/","r/pci.*/","r/sd.*/","a/.*/" ] > > cache_dir = "/etc/lvm/cache" > > cache_file_prefix = "" > > write_cache_state = 1 > > sysfs_scan = 1 > > md_component_detection = 1 > > md_chunk_alignment = 1 > > data_alignment_detection = 1 > > data_alignment = 0 > > > data_alignment_offset_detection = 1 > > ignore_suspended_devices = 0 > > } > > > > log { > > > > verbose = 0 > > syslog = 1 > > overwrite = 0 > > level = 0 > > indent = 1 > > command_names = 0 > > prefix = " " > > } > > > > backup { > > > > backup = 1 > > backup_dir = > "/etc/lvm/backup" > > archive = 1 > > archive_dir = > "/etc/lvm/archive" > > retain_min = 10 > > retain_days = 30 > > } > > > > shell { > > > > history_size = 100 > > } > > global { > > library_dir = "/usr/lib64" > > umask = 077 > > test = 0 > > units = "h" > > si_unit_consistency = 0 > > activation = 1 > > proc = "/proc" > > locking_type = 3 > > wait_for_locks = 1 > > fallback_to_clustered_locking > = 1 > > fallback_to_local_locking = 1 > > locking_dir = "/var/lock/lvm" > > prioritise_write_locks = 1 > > } > > > > activation { > > udev_sync = 1 > > missing_stripe_filler = > "error" > > reserved_stack = 256 > > reserved_memory = 8192 > > process_priority = -18 > > mirror_region_size = 512 > > readahead = "auto" > > mirror_log_fault_policy = > "allocate" > > mirror_image_fault_policy = > "remove" > > } > > dmeventd { > > > > mirror_library = > "libdevmapper-event-lvm2mirror.so" > > snapshot_library = > "libdevmapper-event-lvm2snapshot.so" > > } > > > > > > If you need more information, I can > provide ... > > > > Thanks for your help > > Priya > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster