Hi, when your cluster gets broken, most likely the reason is, there is a network problem (switch restart or multicast traffic is lost for a while) on the interface where serverX-priv IPs are configured. Having a quorum disk may help by giving a quorum vote to one of the servers, so it can fence the others, but the best thing to do is to fix your network and preferably add a redundant link for the cluster communication to avoid breakage in the first place On Mon, 30 May 2011 12:17:07 -0700 (PDT), Srija <swap_project@xxxxxxxxx> wrote: > Hi, > > I am very new to the redhat cluster. Need some help and suggession for the > cluster configuration. > We have sixteen node cluster of > > OS : Linux Server release 5.5 (Tikanga) > kernel : 2.6.18-194.3.1.el5xen. > > The problem is sometimes the cluster is getting broken. The solution is > (still yet)to reboot the > sixteen nodes. Otherwise the nodes are not joining > > We are using clvm and not using any quorum disk. The quorum is by default. > > When it is getting broken, clustat commands shows evrything offline > except the node from where > the clustat command executed. If we execute vgs, lvs command, those > commands are getting hung. > > Here is at present the clustat report > ------------------------------------- > > [server1]# clustat > Cluster Status for newcluster @ Mon May 30 14:55:10 2011 > Member Status: Quorate > > Member Name ID Status > ------ ---- ---- ------ > server1 1 Online > server2 2 Online, Local > server3 3 Online > server4 4 Online > server5 5 Online > server6 6 Online > server7 7 Online > server8 8 Online > server9 9 Online > server10 10 Online > server11 11 Online > server12 12 Online > server13 13 Online > server14 14 Online > server15 15 Online > server16 16 Online > > Here the cman_tool status output from one server > -------------------------------------------------- > > [server1 ~]# cman_tool status > Version: 6.2.0 > Config Version: 23 > Cluster Name: newcluster > Cluster Id: 53322 > Cluster Member: Yes > Cluster Generation: 11432 > Membership state: Cluster-Member > Nodes: 16 > Expected votes: 16 > Total votes: 16 > Quorum: 9 > Active subsystems: 8 > Flags: Dirty > Ports Bound: 0 11 > Node name: server1 > Node ID: 1 > Multicast addresses: xxx.xxx.xxx.xx > Node addresses: 192.168.xxx.xx > > > Here is the cluster.conf file. > ------------------------------ > > <?xml version="1.0"?> > <cluster alias="newcluster" config_version="23" name="newcluster"> > <fence_daemon clean_start="1" post_fail_delay="0" post_join_delay="15"/> > > <clusternodes> > > <clusternode name="server1-priv" nodeid="1" votes="1"> > <fence><method name="1"> > <device name="ilo-server1r"/></method> > </fence> > </clusternode> > > <clusternode name="server2-priv" nodeid="3" votes="1"> > <fence><method name="1"> > <device name="ilo-server2r"/></method> > </fence> > </clusternode> > > <clusternode name="server3-priv" nodeid="2" votes="1"> > <fence><method name="1"> > <device name="ilo-server3r"/></method> > </fence> > </clusternode> > > [ ... sinp .....] > > <clusternode name="server16-priv" nodeid="16" votes="1"> > <fence><method name="1"> > <device name="ilo-server16r"/></method> > </fence> > </clusternode> > > </clusternodes> > <cman/> > > <dlm plock_ownership="1" plock_rate_limit="0"/> > <gfs_controld plock_rate_limit="0"/> > > <fencedevices> > <fencedevice agent="fence_ilo" hostname="server1r" login="Admin" > name="ilo-server1r" passwd="xxxxx"/> > .......... > <fencedevice agent="fence_ilo" hostname="server16r" login="Admin" > name="ilo-server16r" passwd="xxxxx"/> > </fencedevices> > <rm> > <failoverdomains/> > <resources/> > </rm></cluster> > > Here is the lvm.conf file > -------------------------- > > devices { > > dir = "/dev" > scan = [ "/dev" ] > preferred_names = [ ] > filter = [ "r/scsi.*/","r/pci.*/","r/sd.*/","a/.*/" ] > cache_dir = "/etc/lvm/cache" > cache_file_prefix = "" > write_cache_state = 1 > sysfs_scan = 1 > md_component_detection = 1 > md_chunk_alignment = 1 > data_alignment_detection = 1 > data_alignment = 0 > data_alignment_offset_detection = 1 > ignore_suspended_devices = 0 > } > > log { > > verbose = 0 > syslog = 1 > overwrite = 0 > level = 0 > indent = 1 > command_names = 0 > prefix = " " > } > > backup { > > backup = 1 > backup_dir = "/etc/lvm/backup" > archive = 1 > archive_dir = "/etc/lvm/archive" > retain_min = 10 > retain_days = 30 > } > > shell { > > history_size = 100 > } > global { > library_dir = "/usr/lib64" > umask = 077 > test = 0 > units = "h" > si_unit_consistency = 0 > activation = 1 > proc = "/proc" > locking_type = 3 > wait_for_locks = 1 > fallback_to_clustered_locking = 1 > fallback_to_local_locking = 1 > locking_dir = "/var/lock/lvm" > prioritise_write_locks = 1 > } > > activation { > udev_sync = 1 > missing_stripe_filler = "error" > reserved_stack = 256 > reserved_memory = 8192 > process_priority = -18 > mirror_region_size = 512 > readahead = "auto" > mirror_log_fault_policy = "allocate" > mirror_image_fault_policy = "remove" > } > dmeventd { > > mirror_library = "libdevmapper-event-lvm2mirror.so" > snapshot_library = "libdevmapper-event-lvm2snapshot.so" > } > > > If you need more information, I can provide ... > > Thanks for your help > Priya > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster