?????GFS6.1???????????????fence??????????? ----- Original Message ----- From: <linux-cluster-request@xxxxxxxxxx> To: <linux-cluster@xxxxxxxxxx> Sent: Wednesday, November 28, 2007 1:01 AM Subject: Linux-cluster Digest, Vol 43, Issue 37 > Send Linux-cluster mailing list submissions to > linux-cluster@xxxxxxxxxx > > To subscribe or unsubscribe via the World Wide Web, visit > https://www.redhat.com/mailman/listinfo/linux-cluster > or, via email, send a message with subject or body 'help' to > linux-cluster-request@xxxxxxxxxx > > You can reach the person managing the list at > linux-cluster-owner@xxxxxxxxxx > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Linux-cluster digest..." > > > Today's Topics: > > 1. Re: Tests to demonstrate Red Hat Cluster Behaviour (Lon Hohberger) > 2. Re: Problems to start ony one cluster service (carlopmart) > 3. Re: Re: CS4 : problem with multiple IP addresses (Alain Moulle) > 4. Re: Re: Re: CS4 : problem with multiple IP addresses > (Patrick Caulfield) > 5. Any thoughts on losing mount? (isplist@xxxxxxxxxxxx) > 6. Re: Service Recovery Failure (Scott Becker) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 27 Nov 2007 10:36:15 -0500 > From: Lon Hohberger <lhh@xxxxxxxxxx> > Subject: Re: Tests to demonstrate Red Hat Cluster > Behaviour > To: linux clustering <linux-cluster@xxxxxxxxxx> > Message-ID: > <1196177775.12646.48.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> > Content-Type: text/plain > > On Mon, 2007-11-26 at 20:14 -0500, Eric Kerin wrote: >> Scott, >> >> Not sure if it works with GFS (I would assume so, but I don't have it >> installed to test) But normally you would run the following to remount >> an already mounted filesystem in read only mode: >> mount -o remount,ro <mountpoint> >> >> And conversely to remount read-write: >> mount -o remount,rw <mountpoint> > > Should be the same w/ GFS. > > -- Lon > > > > ------------------------------ > > Message: 2 > Date: Tue, 27 Nov 2007 16:43:36 +0100 > From: carlopmart <carlopmart@xxxxxxxxx> > Subject: Re: Problems to start ony one cluster service > To: linux clustering <linux-cluster@xxxxxxxxxx> > Message-ID: <474C3B28.6090505@xxxxxxxxx> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Lon Hohberger wrote: >> On Tue, 2007-11-27 at 11:26 +0100, carlopmart wrote: >>> Hi all >>> >>> I have a very strange problem. I have configured three nodes under RHCS on >>> rhel5.1 servers. All works ok, except for one service that never starts when >>> rgmanager start-up. My cluster conf is: >>> >>> <?xml version="1.0"?> >>> <cluster alias="RhelXenCluster" config_version="17" name="RhelXenCluster"> >>> <fence_daemon post_fail_delay="0" post_join_delay="3"/> >>> <clusternodes> >>> <clusternode name="rhelclu01.hpulabs.org" nodeid="1" votes="1"> >>> <fence> >>> <method name="1"> >>> <device name="gnbd-fence" >>> nodename="rhelclu01.hpulabs.org"/> >>> </method> >>> </fence> >>> <multicast addr="239.192.75.55" interface="eth0"/> >>> </clusternode> >>> <clusternode name="rhelclu02.hpulabs.org" nodeid="2" votes="1"> >>> <fence> >>> <method name="1"> >>> <device name="gnbd-fence" >>> nodename="rhelclu02.hpulabs.org"/> >>> </method> >>> </fence> >>> <multicast addr="239.192.75.55" interface="eth0"/> >>> </clusternode> >>> <clusternode name="rhelclu03.hpulabs.org" nodeid="3" votes="1"> >>> <fence> >>> <method name="1"> >>> <device name="gnbd-fence" >>> nodename="rhelclu03.hpulabs.org"/> >>> </method> >>> </fence> >>> <multicast addr="239.192.75.55" interface="xenbr0"/> >>> </clusternode> >>> </clusternodes> >>> <cman expected_votes="1" two_node="0"> >>> <multicast addr="239.192.75.55"/> >>> </cman> >>> <fencedevices> >>> <fencedevice agent="fence_gnbd" name="gnbd-fence" >>> servers="rhelclu03.hpulabs.org"/> >>> </fencedevices> >>> <rm log_facility="local4" log_level="7"> >>> <failoverdomains> >>> <failoverdomain name="PriCluster" ordered="1" >>> restricted="1"> >>> <failoverdomainnode >>> name="rhelclu01.hpulabs.org" priority="1"/> >>> <failoverdomainnode >>> name="rhelclu02.hpulabs.org" priority="2"/> >>> </failoverdomain> >>> <failoverdomain name="SecCluster" ordered="1" >>> restricted="1"> >>> <failoverdomainnode >>> name="rhelclu02.hpulabs.org" priority="1"/> >>> <failoverdomainnode >>> name="rhelclu01.hpulabs.org" priority="2"/> >>> </failoverdomain> >>> </failoverdomains> >>> <resources> >>> <ip address="172.25.50.10" monitor_link="1"/> >>> <ip address="172.25.50.11" monitor_link="1"/> >>> <ip address="172.25.50.12" monitor_link="1"/> >>> <ip address="172.25.50.13" monitor_link="1"/> >>> <ip address="172.25.50.14" monitor_link="1"/> >>> <ip address="172.25.50.15" monitor_link="1"/> >>> <ip address="172.25.50.16" monitor_link="1"/> >>> <ip address="172.25.50.17" monitor_link="1"/> >>> <ip address="172.25.50.18" monitor_link="1"/> >>> <ip address="172.25.50.19" monitor_link="1"/> >>> <ip address="172.25.50.20" monitor_link="1"/> >>> </resources> >>> <service autostart="1" domain="PriCluster" name="dns-svc" >>> recovery="relocate"> >>> <ip ref="172.25.50.10"> >>> <script >>> file="/data/cfgcluster/etc/init.d/named" name="named"/> >>> </ip> >>> </service> >>> <service autostart="1" domain="SecCluster" name="mail-svc" >>> recovery="relocate"> >>> <ip ref="172.25.50.11"> >>> <script >>> file="/data/cfgcluster/etc/init.d/postfix-cluster" name="postfix"/> >>> </ip> >>> </service> >>> <service autostart="1" domain="SecCluster" name="rsync-svc" >>> recovery="relocate"> >>> <ip ref="172.25.50.13"> >>> <script >>> file="/data/cfgcluster/etc/init.d/rsyncd" name="rsyncd"/> >>> </ip> >>> </service> >>> <service autostart="1" domain="PriCluster" name="wwwsoft-svc" >>> recovery="relocate"> >>> <ip ref="172.25.50.14"> >>> <script >>> file="/data/cfgcluster/etc/init.d/httpd-mirror" name="httpd-mirror"/> >>> </ip> >>> </service> >>> <service autostart="1" domain="SecCluster" name="proxy-svc" >>> recovery="relocate"> >>> <ip ref="172.25.50.15"> >>> <script >>> file="/data/cfgcluster/etc/init.d/squid" name="squid"/> >>> </ip> >>> </service> >>> </rm> >>> </cluster> >>> >>> The service that returns me errors and never starts when rgmanager start-up is >>> postfix-cluster. On maillog file I find this error: >> >> >>> Nov 26 11:27:31 rhelclu01 postfix[27959]: fatal: parameter inet_interfaces: no >>> local interface found for 172.25.50.11 >>> Nov 26 11:27:43 rhelclu01 postfix[28313]: fatal: >>> /data/cfgcluster/etc/postfix-cluster/postfix-script: Permission denied >> >>> but thath's not true. If I start this service manually all works ok. Postfix >>> configuration it is ok, What can be the problem??? I don't know why rgmanager >>> dosen't config 172.25.50.11 address before execute postfix-cluster service .... >> > > Hi Lon, > > >> When you start it manually -- how? >> * add IP manually / running the script? > Yes, and it works. > >> * rg_test? > > Works. > > >> * clusvcadm -e? > > Sometimes works, sometimes not. I need to disable service first, and sometimes > when I try to re-enable works and other not. >> >> -- Lon >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > > -- > CL Martinez > carlopmart {at} gmail {d0t} com > > > > ------------------------------ > > Message: 3 > Date: Tue, 27 Nov 2007 17:26:01 +0100 > From: Alain Moulle <Alain.Moulle@xxxxxxxx> > Subject: Re: Re: CS4 : problem with multiple IP > addresses > To: linux-cluster@xxxxxxxxxx > Message-ID: <474C4519.8060205@xxxxxxxx> > Content-Type: text/plain; charset=us-ascii > > Hi Patrick > > you mean like this in cluster.conf : > <clusternodes> > <clusternode name="192.168.1.2" votes="1"> > <fence> > <method name="1"> > <device name="NODE_NAMEfence" option="reboot"/> > </method> > </fence> > </clusternode> > .. > > ??? > > and if so, we should use "cman_tool join -d -n 192.168.1.2" instead > of "service cman start" > > Is this right ? > > Thanks > Regards > Alain > >> Setting the IP address in cluster.conf and starting the cluster like >> this works: >> cman_tool join -d -n 192.168.1.2 >> What you have sounds like a bug, can you give us some more information >> please ? cluster.conf files, errors from 'cman_tool join -d' and output >> from dnslookup/host ? >> Thanks >> Patrick > > > > ------------------------------ > > Message: 4 > Date: Tue, 27 Nov 2007 16:34:09 +0000 > From: Patrick Caulfield <pcaulfie@xxxxxxxxxx> > Subject: Re: Re: Re: CS4 : problem with multiple IP > addresses > To: linux clustering <linux-cluster@xxxxxxxxxx> > Message-ID: <474C4701.30602@xxxxxxxxxx> > Content-Type: text/plain; charset=ISO-8859-1 > > Alain Moulle wrote: >> Hi Patrick >> >> you mean like this in cluster.conf : >> <clusternodes> >> <clusternode name="192.168.1.2" votes="1"> >> <fence> >> <method name="1"> >> <device name="NODE_NAMEfence" option="reboot"/> >> </method> >> </fence> >> </clusternode> >> ... >> >> ??? >> >> and if so, we should use "cman_tool join -d -n 192.168.1.2" instead >> of "service cman start" >> >> Is this right ? > > Well, it's nasty but it worked for me. I'm happy to actually fix the bug > if I can reproduce it. > > Patrick > > > > ------------------------------ > > Message: 5 > Date: Tue, 27 Nov 2007 10:34:18 -0600 > From: "isplist@xxxxxxxxxxxx" <isplist@xxxxxxxxxxxx> > Subject: Any thoughts on losing mount? > To: linux-cluster <linux-cluster@xxxxxxxxxx> > Message-ID: <20071127103418.715496@leena> > Content-Type: text/plain; charset="iso-8859-1" > > I'm pulling my hair out here :). > One node in my cluster has decided that it doesn't want to mount a storage > partition which other nodes are not having a problem with. The console > messages say that there is an inconsistency in the filesystem yet none of the > other nodes are complaining. > > I cannot figure this one out so am hoping someone on the list can give me some > leads on what else to look for as I do not want to cause any new problems. > > Mike > > > Nov 27 10:29:26 compdev kernel: GFS: Trying to join cluster "lock_dlm", > "vgcomp:web" > Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Joined cluster. Now > mounting FS... > Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Trying to > acquire journal lock... > Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Looking at > journal... > Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Done > Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Scanning for log > elements... > Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Found 1 unlinked > inodes > Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Found quota changes > for 0 IDs > Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Done > Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: fatal: filesystem > consistency error > Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: RG = 31104599 > Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: function = > gfs_setbit > Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: file = > /home/xos/gen/updates-2007-11/xlrpm29472/rpm/BUILD/gfs-kernel-2.6.9-72/up/src/ > gfs/bits.c, line = 71 > Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: time = 1196180975 > Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: about to withdraw from > the cluster > Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: waiting for > outstanding I/O > Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: telling LM to withdraw > Nov 27 10:29:37 compdev kernel: lock_dlm: withdraw abandoned memory > Nov 27 10:29:37 compdev kernel: GFS: fsid=vgcomp:web.3: withdrawn > > > > > > > ------------------------------ > > Message: 6 > Date: Tue, 27 Nov 2007 08:52:34 -0800 > From: Scott Becker <scottb@xxxxxxxx> > Subject: Re: Service Recovery Failure > To: linux clustering <linux-cluster@xxxxxxxxxx> > Message-ID: <474C4B52.6080400@xxxxxxxx> > Content-Type: text/plain; charset="iso-8859-1" > > > > Lon Hohberger wrote: >> On Mon, 2007-11-26 at 14:36 -0800, Scott Becker wrote: >> >> >>> openais[9498]: [CLM ] CLM CONFIGURATION CHANGE >>> openais[9498]: [CLM ] New Configuration: >>> kernel: dlm: closing connection to node 3 >>> fenced[9568]: 205.234.65.133 not a cluster member after 0 sec >>> post_fail_delay >>> openais[9498]: [CLM ] r(0) ip(205.234.65.132) >>> openais[9498]: [CLM ] Members Left: >>> openais[9498]: [CLM ] r(0) ip(205.234.65.133) >>> openais[9498]: [CLM ] Members Joined: >>> openais[9498]: [CLM ] CLM CONFIGURATION CHANGE >>> openais[9498]: [CLM ] New Configuration: >>> openais[9498]: [CLM ] r(0) ip(205.234.65.132) >>> openais[9498]: [CLM ] Members Left: >>> openais[9498]: [CLM ] Members Joined: >>> openais[9498]: [SYNC ] This node is within the primary component and >>> will provide service. >>> openais[9498]: [TOTEM] entering OPERATIONAL state. >>> openais[9498]: [CLM ] got nodejoin message 205.234.65.132 >>> openais[9498]: [CPG ] got joinlist message from node 2 >>> >> >> Did it even try to run the fence_apc agent? It should have done >> *something* - it didn't even look like it tried to fence. >> >> -- Lon >> >> > No sign of an attempt. How do I turn up the verbosity of fenced? I'll > repeat the test. The only mention I can find is -D but I don't know how > I can use that. I'll browse the source and see if I can learn anything. > I'm using 2.0.73. > > thanks > scottb > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: https://www.redhat.com/archives/linux-cluster/attachments/20071127/09991466/attachment.html > > ------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > End of Linux-cluster Digest, Vol 43, Issue 37 > ********************************************* > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster