Re: Linux-cluster Digest, Vol 43, Issue 37

<jialisong@xxxxxxxxx> · Wed, 28 Nov 2007 09:33:14 +0800 (CST)



?????GFS6.1???????????????fence???????????
----- Original Message ----- 
From: <linux-cluster-request@xxxxxxxxxx>
To: <linux-cluster@xxxxxxxxxx>
Sent: Wednesday, November 28, 2007 1:01 AM
Subject: Linux-cluster Digest, Vol 43, Issue 37


> Send Linux-cluster mailing list submissions to
> linux-cluster@xxxxxxxxxx
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> https://www.redhat.com/mailman/listinfo/linux-cluster
> or, via email, send a message with subject or body 'help' to
> linux-cluster-request@xxxxxxxxxx
> 
> You can reach the person managing the list at
> linux-cluster-owner@xxxxxxxxxx
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Linux-cluster digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: Tests to demonstrate Red Hat Cluster Behaviour (Lon Hohberger)
>   2. Re: Problems to start ony one cluster service (carlopmart)
>   3. Re: Re: CS4 : problem with multiple IP addresses (Alain Moulle)
>   4. Re: Re: Re: CS4 : problem with multiple IP addresses
>      (Patrick Caulfield)
>   5. Any thoughts on losing mount? (isplist@xxxxxxxxxxxx)
>   6. Re: Service Recovery Failure (Scott Becker)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Tue, 27 Nov 2007 10:36:15 -0500
> From: Lon Hohberger <lhh@xxxxxxxxxx>
> Subject: Re:  Tests to demonstrate Red Hat Cluster
> Behaviour
> To: linux clustering <linux-cluster@xxxxxxxxxx>
> Message-ID:
> <1196177775.12646.48.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
> Content-Type: text/plain
> 
> On Mon, 2007-11-26 at 20:14 -0500, Eric Kerin wrote:
>> Scott,
>> 
>> Not sure if it works with GFS (I would assume so, but I don't have it
>> installed to test)  But normally you would run the following to remount
>> an already mounted filesystem in read only mode:
>> mount -o remount,ro <mountpoint>
>> 
>> And conversely to remount read-write:
>> mount -o remount,rw <mountpoint>
> 
> Should be the same w/ GFS.
> 
> -- Lon
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Tue, 27 Nov 2007 16:43:36 +0100
> From: carlopmart <carlopmart@xxxxxxxxx>
> Subject: Re:  Problems to start ony one cluster service
> To: linux clustering <linux-cluster@xxxxxxxxxx>
> Message-ID: <474C3B28.6090505@xxxxxxxxx>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Lon Hohberger wrote:
>> On Tue, 2007-11-27 at 11:26 +0100, carlopmart wrote:
>>> Hi all
>>>
>>>   I have a very strange problem. I have configured three nodes under RHCS on 
>>> rhel5.1 servers. All works ok, except for one service that never starts when 
>>> rgmanager start-up. My cluster conf is:
>>>
>>> <?xml version="1.0"?>
>>> <cluster alias="RhelXenCluster" config_version="17" name="RhelXenCluster">
>>>          <fence_daemon post_fail_delay="0" post_join_delay="3"/>
>>>          <clusternodes>
>>>                  <clusternode name="rhelclu01.hpulabs.org" nodeid="1" votes="1">
>>>                          <fence>
>>>                                  <method name="1">
>>>                                          <device name="gnbd-fence" 
>>> nodename="rhelclu01.hpulabs.org"/>
>>>                                  </method>
>>>                          </fence>
>>>                          <multicast addr="239.192.75.55" interface="eth0"/>
>>>                  </clusternode>
>>>                  <clusternode name="rhelclu02.hpulabs.org" nodeid="2" votes="1">
>>>                          <fence>
>>>                                  <method name="1">
>>>                                          <device name="gnbd-fence" 
>>> nodename="rhelclu02.hpulabs.org"/>
>>>                                  </method>
>>>                          </fence>
>>>                          <multicast addr="239.192.75.55" interface="eth0"/>
>>>                  </clusternode>
>>>                  <clusternode name="rhelclu03.hpulabs.org" nodeid="3" votes="1">
>>>                          <fence>
>>>                                  <method name="1">
>>>                                          <device name="gnbd-fence" 
>>> nodename="rhelclu03.hpulabs.org"/>
>>>                                  </method>
>>>                          </fence>
>>>                          <multicast addr="239.192.75.55" interface="xenbr0"/>
>>>                  </clusternode>
>>>          </clusternodes>
>>>          <cman expected_votes="1" two_node="0">
>>>                  <multicast addr="239.192.75.55"/>
>>>          </cman>
>>>          <fencedevices>
>>>                  <fencedevice agent="fence_gnbd" name="gnbd-fence" 
>>> servers="rhelclu03.hpulabs.org"/>
>>>          </fencedevices>
>>>          <rm log_facility="local4" log_level="7">
>>>                  <failoverdomains>
>>>                          <failoverdomain name="PriCluster" ordered="1" 
>>> restricted="1">
>>>                                  <failoverdomainnode 
>>> name="rhelclu01.hpulabs.org" priority="1"/>
>>>                                  <failoverdomainnode 
>>> name="rhelclu02.hpulabs.org" priority="2"/>
>>>                          </failoverdomain>
>>>                          <failoverdomain name="SecCluster" ordered="1" 
>>> restricted="1">
>>>                                  <failoverdomainnode 
>>> name="rhelclu02.hpulabs.org" priority="1"/>
>>>                                  <failoverdomainnode 
>>> name="rhelclu01.hpulabs.org" priority="2"/>
>>>                          </failoverdomain>
>>>                  </failoverdomains>
>>>                  <resources>
>>> <ip address="172.25.50.10" monitor_link="1"/>
>>>                          <ip address="172.25.50.11" monitor_link="1"/>
>>>                          <ip address="172.25.50.12" monitor_link="1"/>
>>>                          <ip address="172.25.50.13" monitor_link="1"/>
>>>                          <ip address="172.25.50.14" monitor_link="1"/>
>>>                          <ip address="172.25.50.15" monitor_link="1"/>
>>>                          <ip address="172.25.50.16" monitor_link="1"/>
>>>                          <ip address="172.25.50.17" monitor_link="1"/>
>>>                          <ip address="172.25.50.18" monitor_link="1"/>
>>>                          <ip address="172.25.50.19" monitor_link="1"/>
>>>                          <ip address="172.25.50.20" monitor_link="1"/>
>>>                  </resources>
>>>                  <service autostart="1" domain="PriCluster" name="dns-svc" 
>>> recovery="relocate">
>>>                          <ip ref="172.25.50.10">
>>>                                  <script 
>>> file="/data/cfgcluster/etc/init.d/named" name="named"/>
>>>                          </ip>
>>>                  </service>
>>>                  <service autostart="1" domain="SecCluster" name="mail-svc" 
>>> recovery="relocate">
>>>                          <ip ref="172.25.50.11">
>>>                                  <script 
>>> file="/data/cfgcluster/etc/init.d/postfix-cluster" name="postfix"/>
>>>                          </ip>
>>>                  </service>
>>>                  <service autostart="1" domain="SecCluster" name="rsync-svc" 
>>> recovery="relocate">
>>>                          <ip ref="172.25.50.13">
>>>                                  <script 
>>> file="/data/cfgcluster/etc/init.d/rsyncd" name="rsyncd"/>
>>>                          </ip>
>>>                  </service>
>>>                  <service autostart="1" domain="PriCluster" name="wwwsoft-svc" 
>>> recovery="relocate">
>>>                          <ip ref="172.25.50.14">
>>>                                  <script 
>>> file="/data/cfgcluster/etc/init.d/httpd-mirror" name="httpd-mirror"/>
>>>                          </ip>
>>>                  </service>
>>>                  <service autostart="1" domain="SecCluster" name="proxy-svc" 
>>> recovery="relocate">
>>>                          <ip ref="172.25.50.15">
>>>                                  <script 
>>> file="/data/cfgcluster/etc/init.d/squid" name="squid"/>
>>>                          </ip>
>>>                  </service>
>>>          </rm>
>>> </cluster>
>>>
>>>   The service that returns me errors and never starts when rgmanager start-up is 
>>> postfix-cluster. On maillog file I find this error:
>> 
>> 
>>>   Nov 26 11:27:31 rhelclu01 postfix[27959]: fatal: parameter inet_interfaces: no 
>>> local interface found for 172.25.50.11
>>> Nov 26 11:27:43 rhelclu01 postfix[28313]: fatal: 
>>> /data/cfgcluster/etc/postfix-cluster/postfix-script: Permission denied
>> 
>>>   but thath's not true. If I start this service manually all works ok. Postfix 
>>> configuration it is ok, What can be the problem??? I don't know why rgmanager 
>>> dosen't config 172.25.50.11 address before execute postfix-cluster service ....
>> 
> 
> Hi Lon,
> 
> 
>> When you start it manually -- how?
>> * add IP manually / running the script?
> Yes, and it works.
> 
>> * rg_test?
> 
> Works.
> 
> 
>> * clusvcadm -e?
> 
> Sometimes works, sometimes not. I need to disable service first, and sometimes 
> when I try to re-enable works and other not.
>> 
>> -- Lon
>> 
>> --
>> Linux-cluster mailing list
>> Linux-cluster@xxxxxxxxxx
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> 
> 
> 
> -- 
> CL Martinez
> carlopmart {at} gmail {d0t} com
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Tue, 27 Nov 2007 17:26:01 +0100
> From: Alain Moulle <Alain.Moulle@xxxxxxxx>
> Subject:  Re: Re: CS4 : problem with multiple IP
> addresses
> To: linux-cluster@xxxxxxxxxx
> Message-ID: <474C4519.8060205@xxxxxxxx>
> Content-Type: text/plain; charset=us-ascii
> 
> Hi Patrick
> 
> you mean like this in cluster.conf :
> <clusternodes>
>        <clusternode name="192.168.1.2" votes="1">
>             <fence>
>                 <method name="1">
>                   <device name="NODE_NAMEfence" option="reboot"/>
>                   </method>
>             </fence>
>        </clusternode>
> ..
> 
> ???
> 
> and if so, we should use "cman_tool join -d -n 192.168.1.2" instead
> of "service cman start"
> 
> Is this right ?
> 
> Thanks
> Regards
> Alain
> 
>> Setting the IP address in cluster.conf and starting the cluster like
>> this works:
>> cman_tool join -d -n 192.168.1.2
>> What you have sounds like a bug, can you give us some more information
>> please ? cluster.conf files, errors from 'cman_tool join -d' and output
>> from dnslookup/host ?
>> Thanks
>> Patrick
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Tue, 27 Nov 2007 16:34:09 +0000
> From: Patrick Caulfield <pcaulfie@xxxxxxxxxx>
> Subject: Re:  Re: Re: CS4 : problem with multiple IP
> addresses
> To: linux clustering <linux-cluster@xxxxxxxxxx>
> Message-ID: <474C4701.30602@xxxxxxxxxx>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Alain Moulle wrote:
>> Hi Patrick
>> 
>> you mean like this in cluster.conf :
>>  <clusternodes>
>>         <clusternode name="192.168.1.2" votes="1">
>>              <fence>
>>                  <method name="1">
>>                    <device name="NODE_NAMEfence" option="reboot"/>
>>                    </method>
>>              </fence>
>>         </clusternode>
>> ...
>> 
>> ???
>> 
>> and if so, we should use "cman_tool join -d -n 192.168.1.2" instead
>> of "service cman start"
>> 
>> Is this right ?
> 
> Well, it's nasty but it worked for me. I'm happy to actually fix the bug
> if I can reproduce it.
> 
> Patrick
> 
> 
> 
> ------------------------------
> 
> Message: 5
> Date: Tue, 27 Nov 2007 10:34:18 -0600
> From: "isplist@xxxxxxxxxxxx" <isplist@xxxxxxxxxxxx>
> Subject:  Any thoughts on losing mount?
> To: linux-cluster <linux-cluster@xxxxxxxxxx>
> Message-ID: <20071127103418.715496@leena>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> I'm pulling my hair out here :).
> One node in my cluster has decided that it doesn't want to mount a storage 
> partition which other nodes are not having a problem with. The console 
> messages say that there is an inconsistency in the filesystem yet none of the 
> other nodes are complaining. 
> 
> I cannot figure this one out so am hoping someone on the list can give me some 
> leads on what else to look for as I do not want to cause any new problems.
> 
> Mike
> 
> 
> Nov 27 10:29:26 compdev kernel: GFS: Trying to join cluster "lock_dlm", 
> "vgcomp:web"
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Joined cluster. Now 
> mounting FS...
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Trying to 
> acquire journal lock...
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Looking at 
> journal...
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: jid=3: Done
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Scanning for log 
> elements...
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Found 1 unlinked 
> inodes
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Found quota changes 
> for 0 IDs
> Nov 27 10:29:28 compdev kernel: GFS: fsid=vgcomp:web.3: Done
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: fatal: filesystem 
> consistency error
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3:   RG = 31104599
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3:   function = 
> gfs_setbit
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3:   file = 
> /home/xos/gen/updates-2007-11/xlrpm29472/rpm/BUILD/gfs-kernel-2.6.9-72/up/src/
> gfs/bits.c, line = 71
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3:   time = 1196180975
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: about to withdraw from 
> the cluster
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: waiting for 
> outstanding I/O
> Nov 27 10:29:35 compdev kernel: GFS: fsid=vgcomp:web.3: telling LM to withdraw
> Nov 27 10:29:37 compdev kernel: lock_dlm: withdraw abandoned memory
> Nov 27 10:29:37 compdev kernel: GFS: fsid=vgcomp:web.3: withdrawn
> 
> 
> 
> 
> 
> 
> ------------------------------
> 
> Message: 6
> Date: Tue, 27 Nov 2007 08:52:34 -0800
> From: Scott Becker <scottb@xxxxxxxx>
> Subject: Re:  Service Recovery Failure
> To: linux clustering <linux-cluster@xxxxxxxxxx>
> Message-ID: <474C4B52.6080400@xxxxxxxx>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> 
> 
> Lon Hohberger wrote:
>> On Mon, 2007-11-26 at 14:36 -0800, Scott Becker wrote:
>>
>>   
>>> openais[9498]: [CLM  ] CLM CONFIGURATION CHANGE
>>> openais[9498]: [CLM  ] New Configuration:
>>> kernel: dlm: closing connection to node 3
>>> fenced[9568]: 205.234.65.133 not a cluster member after 0 sec 
>>> post_fail_delay
>>> openais[9498]: [CLM  ]     r(0) ip(205.234.65.132)
>>> openais[9498]: [CLM  ] Members Left:
>>> openais[9498]: [CLM  ]     r(0) ip(205.234.65.133)
>>> openais[9498]: [CLM  ] Members Joined:
>>> openais[9498]: [CLM  ] CLM CONFIGURATION CHANGE
>>> openais[9498]: [CLM  ] New Configuration:
>>> openais[9498]: [CLM  ]     r(0) ip(205.234.65.132)
>>> openais[9498]: [CLM  ] Members Left:
>>> openais[9498]: [CLM  ] Members Joined:
>>> openais[9498]: [SYNC ] This node is within the primary component and 
>>> will provide service.
>>> openais[9498]: [TOTEM] entering OPERATIONAL state.
>>> openais[9498]: [CLM  ] got nodejoin message 205.234.65.132
>>> openais[9498]: [CPG  ] got joinlist message from node 2
>>>     
>>
>> Did it even try to run the fence_apc agent?  It should have done
>> *something* - it didn't even look like it tried to fence.
>>
>> -- Lon
>>
>>   
> No sign of an attempt. How do I turn up the verbosity of fenced? I'll 
> repeat the test. The only mention I can find is -D but I don't know how 
> I can use that. I'll browse the source and see if I can learn anything. 
> I'm using 2.0.73.
> 
>    thanks
>    scottb
> 
> 
> 
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://www.redhat.com/archives/linux-cluster/attachments/20071127/09991466/attachment.html
> 
> ------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> End of Linux-cluster Digest, Vol 43, Issue 37
> *********************************************
>

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster