Re: Cluster environment issue

Srija <swap_project@xxxxxxxxx> · Tue, 31 May 2011 06:51:35 -0700 (PDT)

Thanks  again  for the reply.

Yes, this cluster environment is of  xen hosts. When the  cluster  is detatched, all the guests are pingable, there is no issue for that. Only as I said , clustat command  shows  everything 'offline',  also  can't able to execute the  lvm related commands.  

iptables  are 'off' already in this  cluster  environment.

regards.

--- On Mon, 5/30/11, Hiroyuki Sato <hiroysato@xxxxxxxxx> wrote:

> From: Hiroyuki Sato <hiroysato@xxxxxxxxx>
> Subject: Re:  Cluster environment issue
> To: "linux clustering" <linux-cluster@xxxxxxxxxx>
> Date: Monday, May 30, 2011, 11:03 PM
> Hello
> 
> I'm not sure, This is useful or not.
> 
> Have you ever checked ``ping some_where'' on domU when
> cluster is broken??
> ( I thought you are using Xen, because you are using
> 2.6.18-194.3.1.el5xen. )
> If it does not respond anything, you should check
> iptables.
> (ex, disable iptables)
> 
> --
> Hiroyuki Sato
> 
> 2011/5/31 Srija <swap_project@xxxxxxxxx>:
> > Thanks for your quick reply.
> >
> > I talked to the network people , but they are saying
> everything is good at their end. Is there anyway at the
> server end, to figure it  for the switch restart or
> multicast traffic?
> >
> > I think you have already checked the cluster.conf
> file.. Except quorum disk, do you think that the cluster
> configuration is sufficient for handling the sixteen node
> cluster!!
> >
> > thanks again .
> > regards
> >
> > --- On Mon, 5/30/11, Kaloyan Kovachev <kkovachev@xxxxxxxxx>
> wrote:
> >
> >> From: Kaloyan Kovachev <kkovachev@xxxxxxxxx>
> >> Subject: Re:  Cluster environment
> issue
> >> To: "linux clustering" <linux-cluster@xxxxxxxxxx>
> >> Date: Monday, May 30, 2011, 4:05 PM
> >> Hi,
> >>  when your cluster gets broken, most likely the
> reason is,
> >> there is a
> >> network problem (switch restart or multicast
> traffic is
> >> lost for a while)
> >> on the interface where serverX-priv IPs are
> configured.
> >> Having a quorum
> >> disk may help by giving a quorum vote to one of
> the
> >> servers, so it can
> >> fence the others, but the best thing to do is to
> fix your
> >> network and
> >> preferably add a redundant link for the cluster
> >> communication to avoid
> >> breakage in the first place
> >>
> >> On Mon, 30 May 2011 12:17:07 -0700 (PDT), Srija
> <swap_project@xxxxxxxxx>
> >> wrote:
> >> > Hi,
> >> >
> >> > I am very new to the redhat cluster. Need
> some help
> >> and suggession for
> >> the
> >> > cluster configuration.
> >> > We have sixteen node cluster of
> >> >
> >> >             OS
> >> : Linux Server release 5.5 (Tikanga)
> >> >
> >>    kernel :  2.6.18-194.3.1.el5xen.
> >> >
> >> > The problem is sometimes the cluster is
> getting
> >> broken. The solution is
> >> > (still yet)to reboot the
> >> > sixteen nodes. Otherwise the nodes are not
> joining
> >> >
> >> > We are using  clvm and not using any quorum
> disk.
> >> The quorum is by
> >> default.
> >> >
> >> > When it is getting broken, clustat commands
> >> shows  evrything  offline
> >> > except the node from where
> >> > the clustat command executed.  If we execute
> vgs,
> >> lvs command, those
> >> > commands are getting hung.
> >> >
> >> > Here is at present the clustat report
> >> > -------------------------------------
> >> >
> >> > [server1]# clustat
> >> > Cluster Status for newcluster @ Mon May 30
> 14:55:10
> >> 2011
> >> > Member Status: Quorate
> >> >
> >> >  Member Name
> >>
> >> ID   Status
> >> >  ------ ----
> >>             ---- ------
> >> >  server1
> >>               1 Online
> >> >  server2
> >>               2 Online,
> >> Local
> >> >  server3
> >>               3 Online
> >> >  server4
> >>               4 Online
> >> >  server5
> >>               5 Online
> >> >  server6
> >>               6 Online
> >> >  server7
> >>               7 Online
> >> >  server8
> >>               8 Online
> >> >  server9
> >>               9 Online
> >> >  server10
> >>
> >>    10 Online
> >> >  server11
> >>
> >>    11 Online
> >> >  server12
> >>
> >>    12 Online
> >> >  server13
> >>
> >>    13 Online
> >> >  server14
> >>
> >>    14 Online
> >> >  server15
> >>
> >>    15 Online
> >> >  server16
> >>
> >>    16 Online
> >> >
> >> > Here the cman_tool status  output  from
> one
> >> server
> >> >
> --------------------------------------------------
> >> >
> >> > [server1 ~]# cman_tool status
> >> > Version: 6.2.0
> >> > Config Version: 23
> >> > Cluster Name: newcluster
> >> > Cluster Id: 53322
> >> > Cluster Member: Yes
> >> > Cluster Generation: 11432
> >> > Membership state: Cluster-Member
> >> > Nodes: 16
> >> > Expected votes: 16
> >> > Total votes: 16
> >> > Quorum: 9
> >> > Active subsystems: 8
> >> > Flags: Dirty
> >> > Ports Bound: 0 11
> >> > Node name: server1
> >> > Node ID: 1
> >> > Multicast addresses: xxx.xxx.xxx.xx
> >> > Node addresses: 192.168.xxx.xx
> >> >
> >> >
> >> > Here is the cluster.conf file.
> >> > ------------------------------
> >> >
> >> > <?xml version="1.0"?>
> >> > <cluster alias="newcluster"
> config_version="23"
> >> name="newcluster">
> >> > <fence_daemon clean_start="1"
> post_fail_delay="0"
> >> post_join_delay="15"/>
> >> >
> >> > <clusternodes>
> >> >
> >> > <clusternode name="server1-priv"
> nodeid="1"
> >> votes="1">
> >> >
> >>   <fence><method name="1">
> >> >
> >>   <device
> name="ilo-server1r"/></method>
> >> >
> >>   </fence>
> >> > </clusternode>
> >> >
> >> > <clusternode name="server2-priv"
> nodeid="3"
> >> votes="1">
> >> >
> >>    <fence><method name="1">
> >> >         <device
> >> name="ilo-server2r"/></method>
> >> >         </fence>
> >> > </clusternode>
> >> >
> >> > <clusternode name="server3-priv"
> nodeid="2"
> >> votes="1">
> >> >
> >>    <fence><method name="1">
> >> >         <device
> >> name="ilo-server3r"/></method>
> >> >         </fence>
> >> > </clusternode>
> >> >
> >> > [ ... sinp .....]
> >> >
> >> > <clusternode name="server16-priv"
> nodeid="16"
> >> votes="1">
> >> >        <fence><method
> >> name="1">
> >> >        <device
> >> name="ilo-server16r"/></method>
> >> >        </fence>
> >> > </clusternode>
> >> >
> >> > </clusternodes>
> >> > <cman/>
> >> >
> >> > <dlm plock_ownership="1"
> plock_rate_limit="0"/>
> >> > <gfs_controld plock_rate_limit="0"/>
> >> >
> >> > <fencedevices>
> >> >         <fencedevice
> >> agent="fence_ilo" hostname="server1r"
> login="Admin"
> >> >
> >>    name="ilo-server1r" passwd="xxxxx"/>
> >> >         ..........
> >> >         <fencedevice
> >> agent="fence_ilo" hostname="server16r"
> >> login="Admin"
> >> >
> >>    name="ilo-server16r" passwd="xxxxx"/>
> >> > </fencedevices>
> >> > <rm>
> >> > <failoverdomains/>
> >> > <resources/>
> >> > </rm></cluster>
> >> >
> >> > Here is the lvm.conf file
> >> > --------------------------
> >> >
> >> > devices {
> >> >
> >> >     dir = "/dev"
> >> >     scan = [ "/dev" ]
> >> >     preferred_names = [ ]
> >> >     filter = [
> >> "r/scsi.*/","r/pci.*/","r/sd.*/","a/.*/" ]
> >> >     cache_dir = "/etc/lvm/cache"
> >> >     cache_file_prefix = ""
> >> >     write_cache_state = 1
> >> >     sysfs_scan = 1
> >> >     md_component_detection = 1
> >> >     md_chunk_alignment = 1
> >> >     data_alignment_detection = 1
> >> >     data_alignment = 0
> >> >
> >>    data_alignment_offset_detection = 1
> >> >     ignore_suspended_devices = 0
> >> > }
> >> >
> >> > log {
> >> >
> >> >     verbose = 0
> >> >     syslog = 1
> >> >     overwrite = 0
> >> >     level = 0
> >> >     indent = 1
> >> >     command_names = 0
> >> >     prefix = "  "
> >> > }
> >> >
> >> > backup {
> >> >
> >> >     backup = 1
> >> >     backup_dir =
> >> "/etc/lvm/backup"
> >> >     archive = 1
> >> >     archive_dir =
> >> "/etc/lvm/archive"
> >> >     retain_min = 10
> >> >     retain_days = 30
> >> > }
> >> >
> >> > shell {
> >> >
> >> >     history_size = 100
> >> > }
> >> > global {
> >> >     library_dir = "/usr/lib64"
> >> >     umask = 077
> >> >     test = 0
> >> >     units = "h"
> >> >     si_unit_consistency = 0
> >> >     activation = 1
> >> >     proc = "/proc"
> >> >     locking_type = 3
> >> >     wait_for_locks = 1
> >> >     fallback_to_clustered_locking
> >> = 1
> >> >     fallback_to_local_locking = 1
> >> >     locking_dir = "/var/lock/lvm"
> >> >     prioritise_write_locks = 1
> >> > }
> >> >
> >> > activation {
> >> >     udev_sync = 1
> >> >     missing_stripe_filler =
> >> "error"
> >> >     reserved_stack = 256
> >> >     reserved_memory = 8192
> >> >     process_priority = -18
> >> >     mirror_region_size = 512
> >> >     readahead = "auto"
> >> >     mirror_log_fault_policy =
> >> "allocate"
> >> >     mirror_image_fault_policy =
> >> "remove"
> >> > }
> >> > dmeventd {
> >> >
> >> >     mirror_library =
> >> "libdevmapper-event-lvm2mirror.so"
> >> >     snapshot_library =
> >> "libdevmapper-event-lvm2snapshot.so"
> >> > }
> >> >
> >> >
> >> > If you need more  information,  I can
> >> provide ...
> >> >
> >> > Thanks for your help
> >> > Priya
> >> >
> >> > --
> >> > Linux-cluster mailing list
> >> > Linux-cluster@xxxxxxxxxx
> >> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >> --
> >> Linux-cluster mailing list
> >> Linux-cluster@xxxxxxxxxx
> >> https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster